Psychometrika 





CONTENTS 


WHY DO PEOPLE FACTOR? - - - - - - - =- = 
_ KARL J. HOLZINGER 


NOTE ON THE RELATIONSHIP BETWEEN INTERNAL 
CONSISTENCY AND TEST-RETEST ESTIMATES 
OF THE RELIABILITY OF A TEST - - - - 

ROBERT W. B. JACKSON 


“BLOCKS TEST” OF MULTIPLE RESPONSE - - - - 
C. H. MCCLOY 


AN ANALYSIS OF LEARNING DATA WHICH DISTIN- 
GUISHES BETWEEN INITIAL PREFERENCE 
AND LEARNING ABILITY - - - - - - - 

HAROLD GULLIKSEN 


TESTS OF STATISTICAL HYPOTHESES IN THE CASE OF 
UNEQUAL OR DISPROPORTIONATE NUMBERS 
OF OBSERVATIONS IN THE SUBCLASSES - - 

FEI TSAO 


A GENERAL FACTOR IN IMPROVEMENT WITH PRAC- 
0 i i 
K. W. HEESE 








VOLUME SEVEN SEPTEMBER 1942 NUMBER THREE 


























PSYCHOMETRIKA—VOL. 7, NO. 3 
SEPTEMBER, 1942 


WHY DO FEOPLE FACTOR? 


KARL J. HOLZINGER 


UNIVERSITY OF CHICAGO 


This is a very simple explanation of factor analysis primarily 
for the “factorial layman.” The interpretation of factors and the 
comparisons of various factor solutions are illustrated with a hypo- 
thetical example so designed that the reader can visualize all the re- 
lationships in a two-dimensional graph. 


Some time ago a psychologist told me that factors were meaning- 
less abstractions and that people who found them were foolish to do 
so. The chief point of this article is to show not only that factors are 
meaningful abstractions, but that people are foolish if they don’t 
find them. 

The fundamental ideas in factor analysis are very simple, but 
their description in the mathematical symbolism employed by factor 
analysts is so forbidding that many people do not get far enough in 
the subject to understand the simple fundamental ideas. I have no 
quarrel with the rigorous mathematical presentation of factor analy- 
sis for the student and expert, but I do feel that the “factorial lay- 











TABLE 1 
RAW SCORES OF FOUR PERSONS ON FIVE TESTS 
Xi 
Person 
Test a b c d Mean 

1 1 2 3 4 2.5 
2 1 6 6 3 4.0 
3 11 26 86 43 29.0 
4 9 14 24 37 21.0 
5 4 20 21 13 14.5 











man” should and can be shown some of the simpler aspects of the sub- 
ject in a language which he can understand. 

Some statistical knowledge for any understanding of factor 
analysis is of course required. As a minimum for the present discus- 
sion it will be assumed that the reader understands the following con- 
cepts: averages, dispersion, correlation, reference axes, and coordi- 
nates. The last two ideas will be reviewed briefly. 


147 











148 PSYCHOMETRIKA 


In order to illustrate various phases of factor analysis so that 
they may be clearly understood, an artificial problem has been so se- 
lected that the various factor solutions may be observed in a graph on 
the plane of the paper. The actual analysis of the data will be carried 
through here only to the determination of one factor and the scores 
for this factor. For the reader who may be a factor analyst, a tech- 
nical appendix has been added in which are described the essential 
ideas of factor analysis. The general reader may ignore this appendix 
and the few technical statements made parenthetically in the text. 

The data of Table 1 may be represented graphically in two ways. 
The first of these is the familiar “scatter diagram.” Thus in Fig. 1 




















2 
bc 
6+ ° Oo 
44 
d 
oe 3° 
2t 
a 
: ie 
0 . = © f 
FIGURE 1 


the two reference axes are the tests taken at right angles (for sim- 
plicity, only orthonogal reference systems will be considered in this 
paper). The scores on tests 1 and 2 are considered as the coordinates 
of the point, i.e., distances measured parallel to the appropriate axes. 
Thus for person b the coordinates are 2 along horizontal axis, and 6 
along the vertical axis. There will be as many points in such a dia- 
gram as there are persons. 

In the next form of representation we choose four axes corre- 
sponding to the four persons as our reference system. Let us begin 
our interpretation first, however, for three such axes and let us call 
these “person axes,” a, b, c. In this representation, each point rep- 
resents a test. 

Two of these axes have been indicated in Fig. 2 and the third 
(0c) is to be imagined drawn perpendicular to the plane of the paper. 
The scores of persons a, b , and c are now considered as the coordi- 

















KARL J. HOLZINGER 149 


nates of a point in the reference system of three person axes. Thus 
for test 1 the point P,(1,2,3) has a coordinate of 1 along the 0a 
axis, 2 along the 0b axis, and 3 along the 0c axis. The point is there- 
fore three units above the plane of the paper. The projection of the 
point in this plane is shown in Fig. 2. If the other four points are 
similarly determined, using the same three-space axis system, we can 
imagine all five points in our three-space. We can also imagine five 
lines (vectors) drawn from the origin to each of these points, and 
the configuration of points in vectors is sometimes called the vector 
representation of the data. Since a test may be considered as a set of 
scores, e. g., test 1: (1, 2, 3), the point P, with these coordinates 
gives a geometric representation of test 1 and similarly for other tests. 

Thus far we have been able to visualize the foregoing vector rep- 
resentation of the data because we have been working on the ordinary 
space of three dimensions. The above geometric notions, however, 
may be generalized to any number of dimensions. Thus if we consider 
the scores of person d as well as those of persons a, b , and c, we take 




















b 
. 
Z 
art ° 
It 
4 n 0 . a 
4 I 2 5 
FIGURE 2 


another axis 0d perpendicular to the other three axes. Each test is 
now considered as consisting of four scores which are the coordinates 
of a point in four-space. Of course we cannot imagine representation 
in a four-space, but we can think of it as analogous to the three- 
space, two-space, or one-space, which we can imagine. 

Let us then consider that the five tests of Table 1 are represented 
geometrically by five points or vectors in a four-space. These five vec- 











150 PSYCHOMETRIKA 
tors will all be different in length. (The lengths are the square roots 
of the sums of the squares of the coordinates, e.g., 

OP, = V1+4+9+ 16= vV30) 


It is very convenient for future work to take the scores as deviates 
from means and also to “normalize” them so that the lengths of the 
vectors will all be unity. (The scores may then be interpreted as direc- 














TABLE 2 
DEVIATIONS FROM MEANS 

| Person Square Root of 

Test a b c d Sum of Squares 
i |-45 +6 O65 i 2.23607 
2 | - 3.0 2.0 20 -1.0 4.24264 
3 | 18.0 -3.0 7.0 14.0 24.04163 
4 | 12.0 -7.0 3.0 16.0 21.40093 
5 | —10.5 5.5 65 -1.5 13.60147 








tion cosines.) Table 2 is first prepared by subtracting the mean of 
each test from the scores given in Table 1. Each number in the right- 
hand column of Table 2 is the square root of the sum of squares of 
scores in the same row. Each row of Table 3 is then computed by di- 
viding the numbers in each row of Table 2 by the square root in that 
row. The last column of Table 3 is a check to show that the sum of 
the squares of the “normalized” deviations for each row is equal to 
unity. (Normalized scores may be also interpreted as standard scores 


= each divided by \/N, here 2). 


Or 

The last row of Table 3 is obtained as follows: Find the total of 
the five scores for each column as indicated in the next-to-the-last 
row. Then “normalize” these totals by dividing each entry by the 


TABLE 3 
NORMALIZED DEVIATIONS FROM MEANS 


Wj 

















Person 
Test a b c d Sum of Squares 
1 — .6708 -.2286 .2236 .6708 -9999 
2 - .7071 4714 4714 -.23857 1.0000 
3 - .7487 -.1248 .2912 .5823 1.0000 
4 - .5607 -.3271 .1402  .7476 9999 
5 - .7720 .4044 .4779 -.1103 1.0001 
Total -3.4593 .2003 1.6043 1.6547 17.8187 
C, - 8312 0481 38855 .3976 .9999 

















KARL J. HOLZINGER 151 


square root of the sum on the right. The divisor is thus 17.3187 
= 4,1616. 


Let us now consider the entries in Table 3. Various interpreta- 
tions of the scores in the first five rows will first be made. Let us take 
the first score on the first line by way of illustration. 

1. The number —.6708 may be interpreted as the score (in nor- 
malized form) of person a on test 1. 

2. The number —.6708 is the coordinate of P, on the 0a axis. 

3. (In general the numbers in the first five rows form a matrix 
w;; of direction cosines of test axes 7 = 1,2,3,---, m with respect 
to person axes i=—1,2,3,---,N). 

Now instead of considering the scores of persons on each test, 
we might consider the scores of persons on the sums of all tests, as 
indicated in the “total” row of Table 3. In order to make these “total” 
scores comparable with scores in the first five lines, they have been 
normalized as shown above and are recorded in the last row of the 
table. The same normalized values would have been obtained if we 
had used “averages” instead of totals. These last numbers may there- 
fore be interpreted as the scores of the individuals on an “average 
test”, i.e., a test whose coordinates are the averages of the coordinates 























TABLE 4 TABLE 5 
CENTROID FACTOR SOLUTION FACTOR SCORES FOR TWO CENTROIDS 

| Factor Factor 
Test | Cc" C, Person | &, C, 
.. | 8997 4364 “ | -.8312 .1767 
ei .6984 -.7155 b. 0481 -.6117 
nS .9601 .2796 ae | .8855 ~.2824 
4. .8016 5978 d. | 8976 175 

8015 -.5979 





of the individual tests. Such an “average test” is known as the first 
centroid factor, and the scores in the last row are the normalized fac- 
tor scores of the four persons. We have thus been making a factor 
analysis of scores rather than one of correlations among variables* 
as is ordinarily done, but the results would have been identical if the 
correlations of the five variables had been analyzed by the usual meth- 
ods (with unities in diagonals). 

The remainder of the analysis is somewhat more involved. The 
general scheme is to find the portion of the scores in Table 3 “attrib- 


* The suggestion for using scores rather than correlations of variables is due 
to Cyril Burt. (See Factors of the Mind, Chapter XVII). 











152 PSYCHOMETRIKA 


utable” to C,, then to subtract this portion from the scores in Table 
3, and from the “residual” table to determine C, (taken orthogonal to 
C,) and so on until all the scores in Table 3 are zero, showing that 
all factors have been eliminated. The final solution, as it is called, is 
given in Table 4. 

This series of equations connects observed scores with factor 
scores. In the present form of analysis the factor scores were com- 
puted directly from the data as indicated in Table 3. From a similar 
table formed from the residual scores, the scores on the second factor, 
C. , were also determined. Both sets of these scores are given in Table 
5. For some phases of factor analysis we want the formal expression 




















C2 
4 
° 
vA 
a 
° 
E 
2 c 
s 
° 
eA 
2 
FIGURE 3 


of observed scores in terms of factor scores as in Table 4, while for 
other problems we want the actual factor scores as in Table 5. These 
two tables illustrate the most important ideas about factor analysis. 
(The factor analyst will note that since all numbers throughout are 
direction cosines a geometric interpretation of all ideas may readily 
be made. Graphs of these relationships greatly clarify all phases of 
factor analysis, but space does not permit their inclusion in this 
paper.) 

The numbers in Table 4 may be interpreted as the coordinates of 
the tests with respect to the C, and C, axes, and the five test points 
may be plotted as shown in Fig. 3. The axes C, and C, constitute the 
graphical representation of one factor solution, while the oblique 














KARL J. HOLZINGER 153 


axes taken through clusters of tests indicate another factor solution 
for the same data. In general, various factor solutions may be thought 
of as resulting from the rotation of axes, keeping the original test 
points fixed. There are thus an infinite number of possible factor so- 
lutions after the common-factor space (here a plane) has been de- 
termined. 

Let us briefly recapitulate what we have done. Starting with the 
scores on five tests which might appear to measure five distinct char- 
acteristics, we have made a factor analysis resulting in only two fac- 
tors. Thus all the original data of Table 1, or Table 3, consisting of 
five measurements of four persons, may be replaced by the data of 
Table 5 consisting of two factor scores for each of four persons. Thus 
economy of description has been achieved with no loss of any essential 
information about these persons. From the viewpoint of simplicity 
of description, therefore, it would appear that it is the person who re- 
fuses to make a factor analysis that is the foolish one, because he re- 
tains a needlessly complicated description of the data. 

An explanation of the data of Table 1 may further clarify the 
foregoing points. The scores for tests 1 and 2 were first selected 
(P, and P, independent). Then the scores for the remaining tests 
were constructed using the scores for the first two tsets in various 
linear combinations, e.g., the score for person a on test 3 was 10 times 
his score on test 1 plus his score on test 2, or, for all persons, w3; = 
10w,; + w2,i=a,b,c,din turn. (Mathematically this means that 
test 3 and, similarly, tests 4 and 5 are linearly dependent on tests 1 
and 2.) In terms of measurement this means that tests 3, 4, and 5 
contain no new information about the four persons measured. All 
the essential information about these four persons is contained in 
tests 1 and 2, or in any other pair of tests, or in factors C, and Cz, or 
in the oblique factors y, and y., or in any other two axes in the plane 
of Fig. 3, inasmuch as the coordinates of the five points may be re- 
ferred to any such pair of axes. 

Now although this problem was made artificially simple so that 
it could be pictured on paper, it illustrates the general idea that the 
factor analyst has in mind. He wants to use the smallest space in 
which the data may be thought of as contained. Actually he is usually 
interested in what is called the “common-factor space,” disregarding 
the “unique-factor space” in which errors of various sorts may be 
thought of as represented. We shall not attempt to discuss these no- 
tions, however, in the present paper. 

Having seen that a factorial description of data is an economical 
one, let us now turn to the question of whether or not such a descrip- 
tion is a “meaningful” one. If there were some reason, apart from the 











154 PSYCHOMETRIKA 


statistical analysis, for attaching some fundamental meaning to say. 
tests 1 and 2, then these might be selected as the factors and all other 
tests (points) referred to them in the plane. If, on the other hand, 
all tests appear equally “meaningful,” such a basis of selecting the 
factors could not be employed. Suppose we turn then to the statistical 
analysis to try to answer the question. In Fig. 3 the obliques axes 
yi and yz pass through clusters of tests (1, 3, 4) and (2, 5). We should 
of course, imagine more tests in each cluster in a problem with actual 
data, but the diagram in the figure illustrates the idea. It might be 
argued that a cluster of tests (1, 3, 4) gives evidence of something 
fundamental among them and that this fundamental characteristic is 
represented by y,. In the present example, however, y, and y2 were 
taken as factors merely because Miss Frances Swineford, who did all 
the calculations, fixed up the clusters so that two axes might be passed 
through them. In other words, she chose the variables so that we 
would get two such factors if we wanted to. These two oblique axes 
give an elegant statistical description of the data, but they represent 
nothing more inherently fundamental than the manner in which the 
variables were chosen. 

Suppose test 1 be thought of as one side of a rectangle and test 
2 the other side. The other tests would then be regarded as linear 
functions of these dimensions. Our factor analysis does not necessarily 
“discover” these original fundamental dimensions, but y, and 2, 
which are close to them (in this example). We could, of course, pick 
our variables so that we could “discover” test 1 and test 2 exactly, 
but this would illustrate nothing but a way of picking variables. 

The sides of the rectangle are convenient aspects that describe 
the rectangle completely, but other aspects of the rectangles might be 
considered more fundamental for certain problems. Thus if we are 
building fences around lots, we might want to use one type of material 
across the front and a different type of material on the other three 
sides. One variable would then be X and the other 2Y + X, if X and 
Y are the sides of the lots. Such data would be more fundamental in 
the sense that they would be the measurements actually used in build- 
ing the fences. Again data could be fixed up so that we could “dis- 
cover” either the values of X and Y, or X and 2Y + X. In other words, 
we can discover what we please to consider as fundamental. To extend 
these notions to actual test data and argue that one factorial solution 
gives more fundamental meaning than another is extremely mislead- 
ing, because it amounts to asking others to accept one arbitrary in- 
terpretation when many other equally good interpretations are pos- 
sible. 

This type of “conditioned thinking” reminds me of my first ex- 




















KARL J. HOLZINGER 155 


perience with an English cigarette while in London. I had traded an 
American cigarette for an English one with a lady and was struggling 
with the tightly packed English variety, when she remarked, “These 
aren’t cigarettes, they are packed too loose,” to which I replied that 
what she gave me wasn’t a cigarette because it was packed too tight. 
Some of the arguments about factor analysis do not rise above this 
plane. 

The comparison of various factorial solutions in the same factor- 
space is analogous to the comparison of various derived scores which 
may be thought of as in a one-space. Thus if mean I.Q. = 100, and 
S.D. = 16, then a person’s score (say 92) may be represented in the 
following ways: 


1.Q. = 92, (1) 
z= —.5, where pate (2) 
T == 45, where T= 10z + 50. (3) 


Each of these three types of scores is a linear function of any other. 
They are all equally satisfactory measures in a one-space, differing 
only in their statistical characteristics given above. 

A student of mine (a poor one) once said: “All this is very con- 
fusing to me. Surely only one of these can be right. The z scores being 
either positive or negative have no psychological meaning. The I.Q. 
seems to me the fundamental one we should use.” 

What about the names of the factors? We can use names if we 
want to. Thus if tests 1, 3, and 4 had been actual data, say three verbal 
tests, we might call y, a verbal factor. Such a name may be a helpful 
interpretation to the extent that we know what tests 1, 3, and 4 are 
like, and also to the extent that we know what the people were like 
taking the tests. Names, however, have no scientific value and may 
result in a harmful type of verbalism, i.e., thinking we know what 
something is because we have named it. Names are arbitrary cate- 
gories which may be transformed to other categories such as y, and 
y2. To the person who thinks tests mean something and factors do 
not we may say, “Have your verbal coordinate system transformed.” 

To the psychologist mentioned at the beginning of the paper, I 
would then say that people factor because they get a simpler picture 
of their data when they do so. Factors are as meaningful as tests, and 
one set of factors is as meaningful as another. Furthermore, no people 
who factor “discover” anything more fundamental than any other 
such people, provided all these people stay in the same space. 











156 PSYCHOMETRIKA 


Technical Appendix 


Assume a linear relationship between scores on tests and underly- 
ing factors of the form: 


W ji = Gri + 0 j2Goi + liad + QimGmi ,’ or (1) 
Wi = AjsGei (s=1, s 3,--+, m) (1)’ 


where w;; are normalized test scores, a;, are coefficients in these linear 
expressions, and G,; are normalized factor scores. 


j=—1,2,3,---,” (variables) 
i=1, 2,3,---, N (persons) 
8=1, 2, 3,---,m (factors) 


The terms in (1) are to be considered as matrices with the usual row- 
by-coluunn multiplication. Factor analysis is concerned with two 
problems: (a) the determination of the a@;, which gives the solution of 
form (1), or (b) the determination of factor scores (or estimates). 

For simplicity we shall consider here only uncorrelated factors. 
Under this condition the various values of a;, in equation (1) can be 
readily determined. Post-multiplying both sides of (1)’ by Gi; we 
find 

WjiGis = Aj, [G.iGis] = 4;,, (2) 
where the product in brackets is the identity matrix if the factors are 
uncorrelated. The matrices @;, can thus be found if we can determine 
Gui. 

The calculations, as illustrated in part by the hypothetical ex- 
ample, may be outlined as follows: 

(a) Calculate G,; (as shown in Table 8). 

(b) Calculate a;, by formula 2 (s = 1). 

(c) Calculate the first term of equation (1) and denote this by 
1W;; (part of w;; attributable to C,). 

In matrix form, ,w;; = 4;,G,; , 

(d) Subtract ,w;; from w;; to obtain the “‘residual score matrix” 
from which the centroid C, is obtained. 

(e) Continue in similar fashion until the residual scores are 
zero (or small enough to be considered negligible). 

Burt, to whom this idea is due, does not give a standard of when 
to stop factoring scores. In our hypothetical example, the rank of wj; 
was 2, so that the elements of w;; were all exactly zero after the ex- 
traction of C,. The method is described here because it furnishes a 
simple interpretation of factor analysis. It may, however, prove a use- 
ful form of analysis particularly with punch-card computation pro- 
vided the factoring of total scores can be justified. 














PSYCHOMETRIKA—VOL. 7, NO. 3 
SEPTEMBER, 1942 


NOTE ON THE RELATIONSHIP BETWEEN INTERNAL 
CONSISTENCY AND TEST-RETEST ESTIMATES 
OF THE RELIABILITY OF A TEST 


ROBERT W. B. JACKSON 
DEPARTMENT OF EDUCATIONAL RESEARCH, UNIVERSITY OF TORONTO 


A comparison of the “test-retest” and “internal consistency” 
estimates of reliability coefficients is given, and it is shown that the 
two methods give different results. Application of the analysis of 
variance and covariance method reveals that there is not just one 
but a number of reliability coefficients involved, and that an esti- 
mate of each of these may be obtained. The analysis shows that in 
using the test-retest method the error or remainder effects are not 
independent on the two trials, possibly because the individuals re- 
member the items and their responses to them on the previous trial. 


A new approach to the problem of measuring the reliability of a 
test has been suggested by Kuder and Richardson,* and Hoyt.} This 
consists essentially in analyzing the results of a single application of 
the test and, subject to certain assumptions, forming an estimate of 
the reliability of the test from the results of this analysis. The older 
methods of estimating test reliability, except in the case of the split- 
half or odds-even method, involve two applications of the test; either 
the same test is repeated (the test-retest method) or an equivalent 
form of the test is given on the second trial. This latter approach fol- 
lows the general procedure used in the other sciences in determining 
the accuracy of measurement, i. e., the measurements are repeated 
and an estimate of the.errors of measurement calculated from the 
differences between the two sets of measurements. 

Clearly the two methods of approach are very different, and it 
does not follow that they will give the same results. The new approach 
involves an analysis of the internal structure, or consistency, of the 
test, whereas the older approach involves a comparison of two sets 
of measurements of the same subjects. The present paper describes 
an attempt to determine empirically the relationship between the “‘in- 
ternal consistency” and the “test-retest” estimates of the reliability 

* Kuder, G. F. and Richardson, M. W. The theory of the estimation of test 
reliability. Psychometrika, 1987, 2, 151-60. 


+ Hoyt, C. Test reliability estimated by analysis of variance. Psychometrika, 
1941, 6, 153-160. 


157 








158 


PSYCHOMETRIKA 


TABLE 1 





ANALYSIS OF VARIANCE AND COVARIANCE OF RESPONSES ON EACH SUBTEST 


A: Subtest 1 
























































































































































Sum of Squares Mean Square Sum of 
Variance at. Trial 1 Trial 2 Trial 1 Trial 2 Products 
Between Items 16 67.48 58.14 4.2175 3.6338 60.27 
Between Persons | 49 29.09 84.46 59387 -7083 26.71 
Error 784 113.11 106.68 1443 1361 47.91 
Total 849 209.68 199.28 .2470 .2347 134.89 
B: Subtest 2 
Sum of Squares | Mean Square Sum of 
Variance az. Trial 1 Trial2 | Triali Trial 2 Products 
Between Items 16 61.80 43.28 3.8625 2.7050 48.62 
Between Persons| 49 25.93 25.86 .5292 .5278 20.63 
Error 784 105.49 90.48 .1346 1154 32.33 
Total 849 193.22 159.62 .2276 .1880 101.58 
C: Subtest 3 
Sum of Squares Mean Square Sum of 
Variance d. f. Trial 1 Trial 2 Trial 1 Trial 2 Products 
Between Items 16 35.21 82.85 2.2006 2.0531 82.95 
Between Persons | 49 44,97 41.56 .9178 .8482 39.24 
Error 784 120.79 111.62 1541 .1424 56.34 
Total 849 200.97 186.03 .2367 .2191 128.53 
D: Subtest 4 
Sum of Squares Mean Square Sum of 
Variance d. f. Trial 1 Trial 2 Trial 1 Trial 2 ' Products 
Between Items 14 32.08 36.83 2.2914 2.6307 31.92 
Between Persons | 49 32.00 31.06 6531 .6339 23.03 
Error 686 123.12 118.90 1795 .1733 44.01 
Total 749 187.20 | 186.79 .2499 .2494 98.96 
E: Subtest 5 
Sum of Squares Mean Square Sum of 
Variance d. Z. Trial 1 Trial 2 Trial 1 Trial 2 Products 
Between Items 14 32.26 29.62 2.3043 2.1157 29.77 
Between Persons| 49 35.69 88.35 .7284 -7827 30.62 
Error 686 119.07 100.65 .1786 .1467 42.10 
Total 749 187.02 168.62 -2497 -2251 102.49 






































ROBERT W. B. JACKSON 159 


of a test. The analysis of variance method suggested by Jackson* and 
Hoyt? will be used as this seems to be the best approach to follow. 

A group intelligence test composed of five subtests and contain- 
ing altogether 81 items was chosen for use in the experiment. One 
form of this test (Form B) was administered to a class of 50 pupils 
and repeated one week later. This gave two sets of results which 
were comparable with regard to both the items used and the individu- 
als tested. Hence there could be made an analysis of the results for 
each application of the test and also an analysis of the relationship 
between the two sets of results. This was done by using the general 
method known as the analysis of variance and covariance. The arith- 
metical procedure to be followed in making such an analysis has been 
explained elsewheret and need not be repeated here. The results of 
the analysis for each of the subtests are given in Table 1. 


























TABLE 2 
ESTIMATES OF THE RELIABILITY COEFFICIENTS (BETWEEN PERSONS) 
Sub- Test-retest Internal Consistency Estimates 
test Estimates Trial 1 Trial 2 
1 .844 -757 806 
2 -797 -746 -781 
3 .908 .882 .832 
4 -730 -725 lat 
a 5 828 762 813 














The usual test-retest estimates of the reliability coefficients for 
the subtests can be obtained from the “Sum of Squares” and “Sum of 
Products” values given in the “Between Persons” row of these tables ; 
these are given in the second column of Table 2. Similarly, the inter- 
nal consistency estimates of the reliability coefficients can be obtained 
from the “Between Persons” and “Error” mean square values (for 
each trial) ; these are given in columns three and four of Table 2. It 
will be seen that in general the test-retest estimates are slightly high- 
er than the internal consistency estimates. Apparently these values 
are not estimates of the same thing; as a matter of fact there is no 
reason why they should be, because the “Between Persons” and “Er- 
ror” sums of squares and sums of products of these tables are, by 
definition, independent of each other. 


* Jackson, R. W. B. Reliability of mental tests. British Journal of Psychol- 


ogy, 1939, 29, 267-87. 
+ Hoyt, C. Op. cit. , : , ‘ 
tJackson, R. W. B. Application of the analysis of variance and covariance 
method to educational problems. Toronto: Department of Educational Research, 
Bulletin No. 11, 1940, 108. 











160 PSYCHOMETRIKA 


In the present case, the test-retest estimates are consistently 
higher than the internal consistency estimates. As has been shown 
elsewhere,* however, this is not always the case. The latter estimates 
may be higher or lower than the former; apparently the relationship 
depends on the nature of the material composing the subtest. 

This method of analyzing the responses makes it possible to ob- 
tain estimates of other kinds of reliability coefficients. A test-retest 
estimate of the reliability coefficient may be calculated for each row 
of the tables (for each subtest) given in Table 1. The first two esti- 
mates in each case refer to the reliability of the difficulty values and 
individual scores, respectively. The estimates for the “Total” rows 
are not very meaningful (being a weighted average of the others) and 
might well be omitted, but the estimates for the “Error” rows are im- 
portant as they show that these “error” values are not independent 
on repeated trials of a test. It is impossible to determine exactly what 
causes this effect, but it may be that in the second testing period in- 
dividuals remember the items and their responses to them on the first 
trial. The complete set of these test-retest estimates is given in Table 
3. 

TABLE 3 


COMPLETE SET OF TEST-RETEST ESTIMATES OF RELIABILITY COEFFICIENTS 
(BY SUBTESTS) 


| Reliability Coefficients 


























Type | Subtest 1 | Subtest2 | Subtest3 | Subtest 4 | Subtest 5 
Between | | | 

Items 962 940 | 969 | .929 .963 
Between | | | | 

Persons 844 -797 .908 | -730 .828 
Error __ 486 | 3381 | .485 | 364 | 385 
Total .660 | 578 .665 | 529 577 








Instead of using the correlation technique, an analysis of vari- 
ance of the difficulty values and scores may be made as shown in 
Tables 4 and 5, respectively. From these analyses the appropriate es- 
timates of the errors of measurement of the difficulty values or scores 
are obtained directly from the entry in the “Mean Square” columns 
and “Error” rows of the tables. These mean square values are not 
directly comparable with those given in Table 4, however, as they do 
not refer to an individual item response. Dividing by the number of 
persons and number of items, respectively, values are obtained which 
are directly comparable with those given in the “Error” rows of 


* Jackson, R. W. B. and Ferguson, G. A. Studies on the reliability of tests. 
Toronto: Department of Educational Research, Bulletin No. 12, 1941. 

















ROBERT W. B. JACKSON 


161 






















































































TABLE 4 
ANALYSIS OF VARIANCE OF DIFFICULTY VALUES (BY SUBTESTS) 
A: Subtest 1 
Sum of 
Variance d. f. Squares Mean Square 
Between Trials 1 95.56 95.56 
Between Items 16 6153.88 884.62 
Error 16 126.94 7.93 
Total 83 GEIGBS 4 | sccsescsmen 
B: Subtest 2 
Sum of 
Variance d. f. Squares Mean Square 
Between Trials 1 207.52 207.52 
Between Items 16 5058.00 316.13 
Error 16 196.48 12.28 
Total 33 Ge) rn en 
C: Subtest 3 
Sum of 
Variance d. f. Squares Mean Square 
Between Trials 1 76.50 76.50 
Between Items 16 8348.94 209.31 
Error 16 54.00 3.38 
Total 83 Sa7o4e =| ees 
D: Subtest 4 
Sum of 
Variance d. f. Squares Mean Square 
Between Trials 1 48.14 48.14 
Between Items 14 3318.86 237.06 
Error 14 126.87 9.06 
Total 29 OUT, win «,'? candcepecvas 
E: Subtest 5 
Sum of 
Variance d. f. Squares Mean Square 
Between Trials 1 333.34 333.34 
Between Items 14 3035.20 216.80 
Error 14 58.66 4.19 
Total 29 Sanroo | Se 














Table I. For convenience, these various estimates have been given in 
Table 6; clearly there is, in most cases, little agreement between these 
values. These results agree with those found by using the correlation 











162 PSYCHOMETRIKA 

























































































TABLE 5 
ANALYSIS OF VARIANCE OF SCORES (BY SUBTESTS) 
A: Subtest 1 
Sum of 
Variance d. f. Squares Mean Square 
Between Trials 1 | 32.49 32.49 
Between Persons 49 994.25 20.29 
Error 49 | 86.01 1.76 
Total 99 | 1112.75 his 
B: Subtest 2 
| Sum of 
Variance d. f. Squares Mean Square 
Between Trials | 1 70.56 70.56 
Between Persons | 49 791.00 16.14 
Error | 49 89.44 1.83 
| 
Total 99 B50 |) hikers 
C: Subtest 3 
| | Sum of 
Variance | df. Squares Mean Square 
Between Trials | 1 | 26.01 26.01 
Between Persons 49 1402.49 28.62 
Error | 49 | 68.49 1.40 
Total | 9 | 149699 | 
D: Subtest 4 
| Sum of 
Variance | wee. Squares Mean Square 
Between Trials | 1 14.44 14.44 
Between Persons 49 818.36 16.70 
Error | 49 127.56 2.60 
Total | 99 | a ee 
E: Subtest 5 
Sum of 
Variance aed. Squares Mean Square 
Between Trials | 1 100.00 100.00 
Between Persons 49 1014.56 20.71 
Error 49 | 96.00 1.96 
Total ee ee 
































ROBERT W. B. JACKSON 163 


technique, and support our original statement that the test-retest and 
internal consistency estimates do not necessarily refer to the same 
thing. 

Tests of the significance of the differences between the values 
given in Table 6 should be applied, but, except for the values given in 
the second and third columns, it is difficult to decide just what are 
the number of degrees of freedom associated with the values, and 
hence involved in the comparison. Possibly the values should not be 
compared as shown in Table 6, but the values from Table 1 adjusted 
to correspond to those shown in either Table 4 or 5, and the compari- 
son made on this basis. The procedure to be followed will presumably 
depend on the kind of errors in which we are interested, i. e., errors 
of an individual response, score, or difficulty value. 

For the particular test considered here, the simple unweighted 
sum of the scores on the different subtests is used as a score for the 
test as a whole. To determine the effect of so combining the subtest 
scores, an analysis of variance and covariance of the responses on 
the whole test may be made. This analysis is given in Table 7. The 


TABLE 6 


COMPARISON OF ESTIMATES OF ERROR (ON THE BASIS OF AN INDIVIDUAL 
ITEM RESPONSE) BY SUBTESTS 


























From Table 1 — —— 
Subtest Triall | Trial2 Table 4 Table 5 
1 1443 | 1361 | 1587 1033 
2 1346 1154 .2456 1074 
3 1541 1424 .0675 .0822 
4 1795 1733 1812 1736 
5 1736 1467 .0838 .1306 

















“Between Subtests” and “Interaction between Subtests” entries in the 
mean square columns show the effect of combining the subtest scores 
in this manner. Clearly the subtests differ considerably in difficulty, 
but except for this effect little is lost in using an unweighted instead 
of a weighted sum of the subtest scores. The interaction mean squares 
are significantly greater than the corresponding error mean squares, 
but in comparison with the differences between subtests and between 
items within subtests (of the first two rows) this interaction effect is 
relatively unimportant. ; 
Finally, the complete set of test-retest estimates of reliability co- 











164 PSYCHOMETRIKA 


TABLE 7 
ANALYSIS OF VARIANCE AND COVARIANCE OF THE RESPONSES ON THE 
WHOLE BATTERY OF 5 SUBTESTS 



































] 
| | Sum of Sears | Mean Square a 
Variance | af. | Trial1 | Trial2 | Trial1 | Trial2| Products 
Between | | | 
Subtests | 4} 1505 | 20.27 3.7625 5.0675 15.89 
| | | 
Between Items | | 
Within Subtests | 76 228.83 | 200.73 8.0109 2.6412 203.52 
Between | | 
Persons 49 116.05 | 114.67 2.3684 2.3402 109.49 
Interaction | | 
Between Subtests | 196 51.63 | 56.62 .2634 .2889 30.73 
Error 3724 581.58 528.33 -1562 .1419 222.69 
Total { 4049 993.14 920.62 .2453 2274 582.32 











TABLE 8 


COMPLETE SET OF TEST-RETEST ESTIMATES OF RELIABILITY COEFFICIENTS 
(FOR THE WHOLE TEST) 























Type | Reliability Coefficients 
Between 
Subtests | .910 
Between Items 
Within Subtests 950 
Between 
Persons | .949 
Interaction 
Between Subtests | .568 
Error .402 
Total .609 





efficients for the whole test may be calculated. These are given in 
Table 8. The interpretation of these values is similar to that con- 
sidered previously, and again the estimate given in the “Total” row 
is of little interest. 

It may be noted in passing that a great deal of information con- 
cerning the internal structure of a test may be extracted by using 
the analysis of variance and covariance technique. The analysis of 
covariance is particularly useful as it enables us to determine the re- 
lationship of the various effects on repeated trials of the test. 




















PSYCHOMETRIKA—VOL. 7, NO. 3 
SEPTEMBER, 1942 


“BLOCKS TEST” OF MULTIPLE RESPONSE 


C. H. MCCLOY 
STATE UNIVERSITY OF IOWA 


A simple test of multiple serial response is proposed which 
makes use of materials that are inexpensive and easily prepared, and 
that do not get out of order. The test employs twenty-four colored 
blocks placed in two rows on a table. The total equipment needed 
consists of the blocks and a watch. 


There are several tests available-to measure serial response. Most 
of these, however, require complicated and expensive apparatus, 
which puts them beyond the reach of the ordinary teacher. For this 
reason we have prepared a rather simple test, which can be manu- 
factured very inexpensively. 

Prepare 24 blocks of wood, about 3 inches square and 1 inch 
thick. Paint the tops of 6 of these blocks red; 6, white; 6 blue; and 
6, green. The bottoms of these blocks are painted in the same four 
colors according to the pattern given in Table 1 below. In this table 
the asterisk preceding the color on the bottom of the block means 
that a black ring is painted around the color on the bottom. This 
black color should not be on the edge of the block and should not be 
visible when the block is on the table. Serial numbers about 4 inch 
high should be painted in black on the tops of the blocks in order to 
facilitate the arrangement of the blocks in their proper order. 

The blocks are placed about 3 inches apart in 2 rows across the 
table in front of the subject. The arrangement is as follows: 


1 3 eee ce reer 23 

2 4 ee ee ee ee Oe ee 24 
To take the test, the subject first memorizes the order of the col- 
ors: red, white, blue, green; red, white, blue .... He is then instruct- 


ed to pick up the first block, look at the color on the bottom, replace 
this block and pick up the next block which is of the color, aecording 
to the color-sequence, following the color on the bottom of the first 
block; for example, the subject picks up the first block (red) ; the col- 
or on the bottom is blue. He should replace this block and pick up the 
next green block, for green is the color following blue in the color- 
sequence; he should then look at the color on the bottom of the green 
block, which is red; replacing this block, he should pick up the next 


165 








166 PSYCHOMETRIKA 


white block, for white is the color following red in the color-sequence; 
this is block number 6. The color on the bottom of block 6 is white 
with a black circle around it. Instead of picking up the next blue 
block, the subject should pick up the blue block preceding the white 
one which he had just picked up. In other words, two things must 
be remembered: If the bottom of a block is a solid color, the next 
block of the color following that color in the color-sequence should 
be picked up; if the bottom of the block has a black circle on it, the 
preceding block of the color following it in the color-sequence should 
be picked up. Therefore, according to the color-sequence (red-white- 
blue-green), a white block should always be picked up after a block 
with the color of red on the bottom, a blue block after a block with 
the color of white on the bottom, a green block after a block with the 
color of blue on the bottom, and a red block after a block with the 
color of green on the bottom. 


TABLE 1 
Block Number Top Bottom Number on Bottom 
1 red blue 1 
2 white any color Vv 
3 blue green 4 
4 green red 2 
5 red white 5 
6 white *white 3 
7 blue green 6 
8 green red 8 
9 red *blue 7 
10 white white 9 
11 blue red 10 
12 green green 12 
13 red white 13 
14 white *blue 11 
15 blue green 14 
16 green any color Vv 
17 red white 15 
18 white blue 17 
19 blue *red 16 
20 green green 18 
21 red white 19 
22 white blue 21 
23 blue *red 20 
24 green any color 22 


In Table 2 the proper sequence is indicated. (The asterisk indi- 
cates that there is a black circle around the color on the bottom of the 


block.) 

















Cc. H. McCLOY 167 


TABLE 2 

Block Coloron Color on Block Color on 

Number Top Bottom Number Top 
Pickup 1 red blue goto 4 green 
Pickup 4 green red goto 6 white 
Pickup 6 white *white goto 8 blue 
Pickup 3 blue green goto 5 red 
Pickup 5 red white goto 7 blue 
Pickup 7 blue green goto 9 red 
Pickup 9 red *blue goto 8 green 
Pickup 8 green red go to 10 white 
Pick up 10 white white go to 11 blue 
Pick up 11 blue red goto14 white 
Pick up 14 white *blue go to 12 green 
Pick up 12 green green goto 13 red 
Pick up 13 red white go to 15 blue 
Pick up 15 blue green go to17 red 
Pick up 17 red white go to 19 blue 
Pick up 19 blue *red go to 18 white 
Pick up 18 white blue go to 20 green 
Pick up 20 green green go to 21 red 
Pick up 21 red white go to 23 blue 
Pick up 23 blue *red go to 22 white 
Pick up 22 white blue go to 24 green 


The subject, being timed in seconds, should be allowed one prac- 
tice trial and two test trials. If he makes an error, he must correct 
it before going on.* 

This test is designed to measure speed in the making of correct 
choices, complicated partly by motor actions and partly by memory 
of the color-sequence. It can be administered rather rapidly, though 
it is primarily a research tool and will probably be used mainly as 
such. The correlation of this test with composite ratings of the abil- 
ities of 300 senior high-school girls in basketball, vollyball, diving, 
and the dance was .510. Hence it is considered that such multiple- 
response abilities will also be associated with many other psycho- 
motor abilities, especially in the field of athletics. This correlation 
would seem to be of sufficient significance to justify the inclusion of 
the test in various researches of motor capacity. 


_ .*To aid the examiner, it is well to have the color of the bottom (with black 
circle where indicated) together with a number indicating the order in which the 
block will be picked up. This order is indicated in the first column of Table 2. 
This information can be marked on a small white label and pasted on the edge 
of the block away from the subject. 

If the subject makes an error, say “Stop: what block do you pick up after 
——?” (naming the color on the bottom) ; or “What block do you pick up after 
—with a black ring?” 











168 PSYCHOMETRIKA 


TABLE 3 


NoRMS FOR HIGH-SCHOOL GIRLS 
These norms were computed from data from 283 senior-high-school girls, with 
ages ranging from 13-19. The r with age was but .17. The norms are in the form 
of T-scores, extrapolated. The relationship was somewhat curvilinear. The score 


is the best in three trials. 


Score on blocks test T-score 
(seconds) 

6 99 
10 89 
14 82 
18 = 
22 72 
26 68 
30 64 
34 61 
38 58 
42 55 
46 52 
50 49 
54 46 
58 44 
62 41 
66 39 
70 37 
74 34 
78 32 
82 30 
86 28 
90 26 
94 25 
98 23 

102 21 
106 20 


Standardized Directions 


(While giving directions, perform the indicated actions.) 

“This is a test of how rapidly you can make decisions that involve 
intelligent choice, and act upon them. You will note that these blocks 
are arranged in two rows, and that going in this order (point), the 
colors on the top are red, white, blue, green, red, white, etc. The bot- 
tom of each block is painted one of these four colors, also, but it may 
be any one of the four colors, regardless of the color on the top. On 
some of the blocks, in addition to the color on the botton, there is a 
black circle. 

“In taking this test, pick up this first (red) block (pick it up), 
raise it, and look at the bottom. Then put it back again, and pick up 

















Cc. H. McCLOY 169 


the block beyond, that is, one color beyond the color on the bottom of 
the block in the red, white, blue, green, red sequence; e.g., the bottom 
of the first block is blue, so you pick up the next block with a green 
top; if the bottom of the block is green, pick up the next block whose 
top is red in color. If, however, the bottom of the block has a black 
ring on it, pick up the next color in the sequence, but before the block 
you have picked up; e.g., if I pick up this block. (pick up No. 9), you 
note that the bottom is blue with a black ring. You would then pick 
up this one (No. 8), whose top is green—the next color in the sequence. 

“Now let us try one or two. If I pick up this one (pick up No. 11, 
turn it over, and show that the bottom is red), which one should I 
pick up next, That is correct, I should pick up this one (No. 14). If 
I pick up this one (pick up No. 19, which has a red bottom with a 
black ring), which would I pick up next? That is correct, it would 
be this one (No. 18). 

“Is the method clear now? All right, begin with this one (No. 
1) and continue to the end on the right.” 

(Note: Continue with the demonstration until the subject gives 
evidence that he understands the procedure. The first trial should not 
be complicated with lack of understanding. Pick the blocks for the 
demonstration at random, not in order. This prevents learning the 
sequence before taking the test. Experimenter may need to tell the 
subject to put blocks down again with the bottoms down.) 























PSYCHOMETRIKA—VOL. 7, NO. 3 
SEPTEMBER, 1942 


AN ANALYSIS OF LEARNING DATA WHICH DISTINGUISHES 
BETWEEN INITIAL PREFERENCE AND LEARNING ABILITY* 


HAROLD GULLIKSEN 
UNIVERSITY OF CHICAGO 


* The author wishes to acknowledge financial assistance from the Social Sci- 
pee Research Committee of the University of Chicago in the completion of this 
study. 


Several sets of learning data furnished by I. Krechevsky have 
been analyzed in terms of meaningful parameters of the learning 
curve, and the changes in the frequency distributions of these para- 
meters with changes in the experimental conditions have been stud- 
ied. One of the parameters represents the animal’s initial preference 
for the light or dark, the other represents learning ability. The 
analysis shows that destruction of about ten or fifteen per cent. of 
the cortex, increases the animal’s preference for the light and de- 
creases the learning ability slightly. By ordinary methods of analy- 
sis, it is not possible to discover that both initial preference and 
learning ability have been changed by any given factor. 


The purpose of this paper is to show how the psychological in- 
terpretation of learning records is facilitated by analyzing them in 
terms of certain parameters of a learning curve equation. Some of 
the difficulties in the conventional mede of analysis will be consid- 
ered together with the attempt to overcome these difficulties by Ruch 
and Thorndike’s method of “common points of mastery.” A new meth- 
od of analyzing learning data in terms of the parameters of the learn- 
ing curve will then be presented and applied to some learning data 
gathered by Krechevsky. It is believed that this new method of analy- 
sis retains the advantages and avoids some of the disadvantages of 
the method of “common points of mastery.” 


Common Points of Mastery 

Learning data are usually analyzed in terms of three different 
measures of the learning performance: (1) the total time taken by 
the subject to reach a given criterion of accuracy; (2) the total num- 
ber of errors made in reaching this criterion; or (3) the total num- 
ber of trials necessary to reach this criterion. 

These three measures of learning performance all have the same 
defect—they are influenced by several different psychologically im- 
portant factors. The subject’s “score” whether measured in time, 
errors, or trials is influenced partly by the learning ability of the 
subject, partly by the subject’s initial competence in the task, and 


171 








172 PSYCHOMETRIKA 


partly by the criterion of excellence required by the experimenter. 
For example, if one subject takes fewer trials than another to learn 
a task, it may be because he started at a higher level of initial com- 
petence, or because he learned more rapidly, or because of some com- 
bination of both of these factors. A similar ambiguity obtains when 
subjects are compared in terms of total errors or total time. 

In order to avoid this difficulty and obtain some indication of the’ 
subject’s learning ability which is independent of the subject’s initial 
competence, Ruch (7) and Thorndike (9) suggested the method of 
“common points of mastery.” In using this method on a group of 
learning records the experimenter ignores all scores below the best 
initial performance. This insures that all persons will start at the 
same initial score. “Common” final scores are secured by discarding 
all records above the poorest final performance. Having “common” 
initial and final scores for a group of subjects, it is then possible to 
determine, the time, errors, or trials made by the subjects between 
these two common points of mastery. 

It is clear that this procedure requires the experimenter to dis- 
card a great deal of the data, but advocates of the method maintain 
that the gain in the precision and meaningfulness of the measures 
obtained is more important than the loss of a certain amount of data. 
Ruch (7) has pointed out that the method assumes that the rate of 
progress a person makes from the initial common point of mastery is 
independent of the method by which he reached that point. In a type- 
writing experiment, will a person who reaches twenty-five words per 
minute during the experiment, having started at ten words per min- 
ute progress beyond that rate more or less rapidly than he would if 
he began at twenty-five words per minute because he had had con- 
siderable previous experience in typing? As Ruch points out, this 
question is one of experimental fact and could be determined by prop- 
er experimental procedures. 

Another weakness of the method is that it fails to take adequate 
account of the effect of differences in final level of performance which 
will be reached by different subjects. For example, one would expect 
a person whose top typing speed was seventy-five words per minute 
to take much longer to improve from sixty to seventy words per min- 
ute than would a person whose top typing speed was 120 words per 
minute. Yet relative to his own final performance both persons might 
be improving with equal rapidity. 


Derivation 


It is possible to retain the advantages of the method of “common 
points of mastery” and to overcome some of the disadvantages of this 

















HAROLD GULLIKSEN 173 


method by analyzing learning data.in terms of certain parameters of 
the learning curve. It is possible to. find one parameter of the learn- 
ing curve which represents “learning ability” defined as the number 
of errors made by each individual between two “common points of 
mastery.” When this is done, we find that in order to describe the 
curve completely another parameter which represents the subject’s 
“initial competence” is also needed. In general these parameters 
would be different for different types of learning curves. In this paper 
the method will be applied to a rational learning equation previously 
developed (1). The general form of this equation is 


att+sit =. 
u=s | (sa) p (1) 


where the variables are 


u, the cumulative errors, and_ 
w , the total trials, or cumulative correct responses, 


and the constants are 


g , the initial strength of the incorrect response, 
h, the initial strength of the correct response, 

c, the effect of punishment, and 

k , the effect of reward. 


For the case where k equals zero, equation (1) becomes 





se: eal (2) 
c 
For the case where k equals c , equation (1) becomes 
Ly 
c 
es hn’ (3) 
w+ 
c 


This equation is the one previously developed by Thurstone (10). 

Let us first consider a parameter that will indicate initial ability, 
initial competence, or initial preference, whichever one cares to call 
it. The number of errors made at the beginning of learning suggests 
itself as a reasonable measure of initial preference. If ninety per cent 
errors are made, the subject has a nine to one preference for the 
stimulus which the experimenter has designated as wrong. If fifty 
per cent errors are made at the beginning of training, the preference 
for the correct stimulus is equal to that for the incorrect stimulus, 
and so on. In the case of any cumulative error curve, such as (1), (2), 


é 
i 
i 
if 
| 
| 
| 








174 PSYCHOMETRIKA 


and (3), the derivative represents errors per trial, and therefore the 
errors per trial at the beginning of training is given by the value of 
du/dw when w equals zero. 

For equation (1) 


du _ (g/k) (h/k)°™* 





dw (wt+h/ky™ * ad 
For equation (4) we find that when w= 0 
dU/AW w-0 =Q9/h. (5) 
Similarly for equation (2), we find that 
au = g -c/h w 
aes an eaial ag 
Evaluating this expression when w = 0 gives 
du/AW (w=0) = 9/h . (7) 
For equation (3) 
__ —hg/e? 


When w = 0 equation (8) becomes 
du/dw wo, =g/h. (9) 


We see then that for equations (1), (2), and (3) the expression g/h 
represents the subject’s initial competence or his tendency to make 
errors at the beginning of the learning task. This quantity is the 
product of the two parameters shown for equation (2), and the ratio 
of the two parameters indicated in equation (3), so that after fitting 
the equation it is easy to calculate the parameter representing initial 
preference. 

Let us turn now to the problem of selecting a parameter of the 
learning curve which represents learning ability as determined by the 
method of common points of mastery. First let us select two points 
of mastery which are the same for all subjects. These points need not 
necessarily be within the range of the experimentally obtained curve 
for all subjects, since if the curve is well-enough defined to enable one 
to determine its parameters, one can then build up the entire curve. 
It is only necessary that the data be sufficient to enable a determina- 
tion of the parameters of that particular curve with reasonable ac- 
curacy. The two common points of mastery chosen were the points of 
zero errors per trial and of one error per trial. Since for the cumula- 

















HAROLD GULLIKSEN 175 


tive error curve the errors per trial at any point are represented by 
the derivative at that point, the point at which the subject is making 
one error per trial is where du/dw equals 1. The point at which the 
subject is making zero errors per trial is where du/dw equals zero. 

The method of common points of mastery is to take the number 
of errors or trials necessary to progress from the one common point 
to the other. Since for equations (1), (2), and (3) the number of 
trials to reach zero errors per trial is infinite, we will use the number 
of errors made in going from the point of one error per trial to the 
point of zero errors per trial as the measure of learning ability. The 
larger the number of errors made between these two common points, 
the less the learning ability. Stated in mathematical terms, this meas- 
ure of learning ability is equal to the number of errors taken to reach 
perfection, or zero errors per trial, minus the number of errors taken 
to reach a criterion of one error per trial. Symbolically we may rep- 
resent this learning ability by L , the value of the derivative by sub- 
scripts and write 


L=W—-%U%. (10) 


This is a general form which would apply to any cumulative error 
curve that approaches a horizontal asymptote. 

The evaluation of u as du/dw approaches zero is easily accom- 
plished. From equations (4), (6), and (8), it can be seen that w ap- 
proaches infinity as du/dw approaches zero for each of these equa- 
tions. Equations (1), (2), and (3) show that as w approaches infinity 
u approaches g/c in each case. For all three cases then we find that 
when du/dw equals zero we have 


Uy =9/c. (11) 


The evaluation of u at the point where du/dw = 1 will be consid- 
ered next. From equation (4) it can be shown that when du/dw = 


. where 


he* 9 / (c+k) 


k ik (13) 


u-| 


Substituting this value of w in equation (1) gives 


wf -C2] 0 











176 PSYCHOMETRIKA 


Substituting (11) and (14) in (10) gives 


k/(c+k) c/(c+k) 
pe (2 ) ( e ) (15) 


as the learning ability parameter for equation (1). 
In the case of equation (2) we find that when its derivative, (6), 


is unity, 


w,—h/elogg/h. (16) 
Substituting (16) in (2) gives 
=9/ce(1— h/g). (17) 
Substituting (11) and (17) in (10) we have 
L=h/c (18) 


as the learning ability parameter for equation (2). It is interesting 
to note that this parameter is frequently regarded as representing a 
“rate of change” for the negative exponential curve. 
For equation (3), the special case in which c = k, we see from 
equation (8) that when the derivative equals unity 
hg —h 
w,= = al (19) 


Substituting (19) in (3) gives 


9g Ih 
=e 1 je _ 
Uy | V3 E (20) 
Substituting (11) and (20) in (10) we find that 


% 
— ae, (21) 


or the geometric mean of the two parameters indicated in equation 
(3), the case c — k. This quantity is the semi-major axis of the hy- 
perbola which Wiley and Wiley (11) have suggested as a measure of 
learning ability. 

To summarize briefly, equations (5), (7), (9), (15), (18) and 
(21) indicate how we may calculate, from the parameters of the cu- 
mulative error curve, two quantities, one representing learning ability 
and the other initial preference. 

An illustration of the type of distinction which can be made by 
determining the learning ability and the initial preference parameters 











HAROLD GULLIKSEN 177 


is given by the data shown in Figure 1. Here we have two animals 
each of which took 100 trials to reach the criterion. Animal number 
3 made 35 errors in reaching the criterion, and animal number 40 
made 29 errors. Judging then by total trials and errors, one would 
say that while there is not much difference between these two animals, 














9/c=37 
30 

number 3 ho hb=62.5 
trials 100 100 20 
errors 35 29 
Wh 133 «.5q2 
he 27.6 62.6 10 
9% 37. 37. 
Sr = 036 9016 ee ee ae a a a a ce ee 
rt.mn.s +30 -20 -I0O fp 10 20 30 40 50 60 70 80 40 100 


, q 
discrep 1.6% 1.0 
ancy. 








FIGURE 1 - 
Two records illustrating the type of differentiation made possible by 


the analysis in terms of parameters of the learning curve. 
there is a slight indication that animal number 40 is the better learn- 
er, since he made fewer errors than number 3 in reaching the cri- 
terion. Now let us look at the information given by the learning abil- 
ity and initial preference parameters. Both animals are approaching 
the same asymptote—g/c = 37 errors for each animal. However, the 
initial preference parameters are markedly different. g/h is 1.33 for 
number 3, and is only .592 for number 40. In other words, at the be- 
ginning of learning number 3 had a big handicap to overcome. He 
preferred the incorrect to the correct stimulus. Number 40, on the 
other hand, definitely preferred the correct stimulus. These two val- 
ues of g/h are shown on the graph by the two straight lines tangent 
to each curve at the origin. The h/c values, or the estimated number 
of errors to progress from the point of one error per trial to the point 
of no errors per trial, is also shown on the graph. This value is 27.8 
for animal number 3 and 62.5 for animal number 40. In the case of 
animal number 40, in order to diagram the h/c value, it is necessary 
to project the curve backward to the point of one error per trial, 











178 PSYCHOMETRIKA 


which would be estimated to have come about 33 trials prior to the 
beginning of training. Instead, then, of saying that animal number 
40 is possibly slightly superior to animal number 3, as would be done 
by using only the total trial and error scores, we would say on the 
basis of an analysis in terms of parameters of the learning curve that 
animal number 3 learned much faster than animal number 40. The 
fact that anima] number 40 made fewer mistakes in reaching the cri- 
terion is accounted for by the fact that at the beginning of training 
he had a preference for the correct stimulus, while animal number 3 
had a preference for the incorrect stimulus. We can see then that the 
analysis in terms of parameters of the learning curve makes it pos- 
sible to distinguish between records which look very similar in terms 
of the conventional analysis based on total trials and total errors. 

As Ruch and Thorndike point out, the principal difficulty which 
the method of common points of mastery attempts to surmount is the 
difficulty of determining learning ability as distinct from initial abil- 
ity. It has been shown how, in the case of cumulative error records, 
the best fitting equation may be used to estimate learning ability ac- 
cording to the pattern suggested by the method of common points of 
mastery. The present method would, however, be free from some of 
the difficulties of the method suggested by Ruch and Thorndike. The 
parameters would be determined from all the data, thus avoiding the 
discarding of much of the data collected. Since the “physiological 
limit” approached by all the animals is “zero errors per trial,’”’ one 
does not have the difficulty of dealing with some subjects who are 
near and others who are far from their maximum attainment. How- 
ever, the criticism raised by Ruch regarding possible dependence of 
the rate of progress after a given point on the method by which that 
point was reached would still hold for this method. 


Application 


In the experiments dealing with the effect of cortical extirpation 
on the learning ability of rats, it has been difficult to determine the 
extent to which a large error or trials score is due to a low learning 
ability, and the extent to which it is due to an initial preference for 
the stimulus which the experimenter selected as the negative stimulus. 
We will turn now to a consideration of this difficulty encountered in 
determining the effect of cortical extirpation on learning ability, and 
show how the analysis in terms of learning curve parameters may be 
applied to these data. 

Lashley (5) tested normal and operated rats on a brightness 
discrimination problem, with light as the positive and dark as the 











HAROLD GULLIKSEN 179 


negative stimulus. He found that the destruction of a rather large per- 
centage of cortical tissue did not significantly lower learning ability 
as measured by the conventional criteria of total errors or total trials 
to a given criterion. 

If anything, the evidence showed that the operation improved the 
learning ability of the animals. Various explanations were suggested 
for the phenomenon. Lashley (5) suggested that brightness discrimi- 
nation was really a very simple problem requiring only one associa- 
tion. He has since, however, (6) questioned the adequacy of this ex- 
planation. Herrick (2) suggested that brightness discrimination is 
mediated by a thalamic mechanism. 

Krechevsky (4) repeated the experiment using conditions slightly 
different from those used by Lashley. Lashley ran both operated and 
normal animals with light as the positive and dark as the negative 
stimulus, and with shock as punishment for the incorrect response. 
Krechevsky ran six groups of animals in all. Two groups, one operated 
and the other normal, were run with shock as punishment for the in- 
correct response, but the learning problem was the opposite of that 
used by Lashley, dark being the positive stimulus, and light the nega- 
tive stimulus. The other four groups were trained without the use of 
shock as punishment for the incorrect response. Of these four groups, 
two (one operated, and one normal) were trained dark positive; 


TABLE 1 
Summary of Krechevsky’s Data Giving the experimental conditions, average trial 
and error scores, and average size of lesion for each of the six groups 
of animals used. (Compiled from reference 4). 


Average Per 
Positive Average Average Cent of Cortex 
Cortex Stimulus Punishment N~ Errors Trials Destroyed 
Normal dark shock 22 19.8 41.8 
Operated dark shock 21 24.6 53.3 14.8 
Normal dark no shock 21 23.9 64.3 
Operated dark no shock 22 37.9 84.1 12.2 
Normal light no shock 10 22.2 64.0 
Operated light no shock 14 15.9 60.0 14.7 


while the other two (one operated and the other normal) were trained 
light positive. Table 1 summarizes the data obtained by Krechevsky 
(4) showing the conditions under which each group was run, the 
number of animals in each group, the average errors and trials to a 
criterion of 19 correct responses in 20, and the average per cent of the 
cortex destroyed before the beginning of learning for each of the 
operated groups. 

Since the operated animals learned the light positive problem 
much faster than they did the dark positive problem, while this was 











180 PSYCHOMETRIKA 


not true for normal animals, Krechevsky concluded that the cerebral 
lesions caused a brightness preference reversal in the rat. This con- 
clusion. was also in agreement with his previous studies on the effect 
of lesions on “hypotheses” (3). 

Because of the great difference between operated and normal ani- 
mals when learning the dark positive habit without punishment, and 
the negligible difference between operated and normal animals when 
learning the same dark positive habit with shock as punishment, he 
concludes that “operated animals are inferior to normal animals in 
the capacity to reverse their preferred mode of response when the 
situation requires it, when no punishment is used.” It is also pointed 
out that this great difference between normal and operated animals 
when trained without shock and negligible difference when using 
shock indicates that ‘a possible effect of a cerebral lesion is the lower- 
ing of the ‘level of attention’ or ‘vigilance’ of the animal.” 

It should be noted that with an analysis based on total trials and 
total errors, while one can tell that the operation altered the relative 
difficulty of the light positive and dark positive habits, it is not possi- 
ble to tell if this alteration of preference is the only change that took 
place. Is the amount of change in error and trial scores produced 
solely by the preference change, or did the learning ability of the 
animals change also? When an analysis is made in terms of total 
errors and total trials, which reflect some unknown composite of 
learning ability and initial preference, it is impossible to answer such 
a question. If on the other hand one parameter of the learning curve 
can be taken as a reasonable approximation of learning ability and 
another can be taken as a reasonable approximation of initial prefer- 
ence, it will be possible to determine whether the operation affected 
both of these parameters, or only one, and if so, which one. 

Krechevsky has shown that at least some of the change in total 
trial and error scores produced by a cortical operation is due to an 
effect on preference, and not to an effect on learning ability. How- 
ever, in the case of the faster learning with shock than without shock, 
it is assumed that the effect is entirely due to an effective speeding 
up of learning which may possibly be a change in “attention” or “vigi- 
lance.” It was not even thought worth while to run animals under 
the conditions of light positive with shock which would enable one to 
determine whether or not some brightness preference existed under 
conditions of shock for punishment, as well as under the no shock 
conditions. However, as pointed out above in connection with the 
study of the effect of the cortical operation, even if one had the data 
on trials and errors under the condition “light positive, with shock 
for punishment” it would still be impossible by conventional methods 














HAROLD GULLIKSEN 181 


to tell whether or not the change was solely due to a preference change, 
or whether it was partly preference and partly learning ability. 

In general, then, we find that analyzing data in terms of total 
errors and total trials does not enable one to distinguish between 
changes in learning ability and changes in initial preference. It has 
been tacitly assumed that a reversal of positive and negative stimulus 
could not reasonably produce a change in learning ability, and hence 
changes in trial and error scores produced by such a reversal could 
be taken as an indication of preference. Similarly, it has not seemed 
reasonable to attribute the change in trial and error scores produced 
by introducing an electric shock to a change in preference; therefore, 
the change has been attributed to a temporary change in learning 
ability. Since this change is temporary while learning ability is 
thought of as something relatively permanent, the change is referred 
to as “attention” or “vigilance.” Likewise, it has been assumed that 
any change in total trials or errors produced by cortical destruction 
must be a change in learning ability, since it seemed that such de- 
struction was more likely to alter learning ability than to alter initial 
preference. By using a separate technique for measuring initial pref- 
erence, Krechevsky has shown that the assumption that changes due 
to cortical destruction are changes in learning ability is not wholly 
correct. 

It is hoped that by analyzing learning data in terms of learning 
curve parameters so chosen that one gives an estimate of learning 
ability for each animal and another gives an estimate of initial pref- 
erence, it will be possible to determine the conditions which affect 
learning ability, those which affect initial preference, and those which 
affect both. 

An analysis was made of Krechevsky’s data by the method pro- 
posed in the first part of this paper, in order to see if this hope would 
be realized. This method consists of fitting a learning curve to each 
individual learning record, determining the parameters representing 
initial preference and learning ability, and then studying the fre- 
quency distributions of these parameters.* 

In carrying out this analysis, the preferred method would be to 
fit the general form, equation (1), to each of the individual learning 
curves, and so determine the c/k parameter independently for each 
learning record. Since equation (1) is very difficult to fit at present, 
it is much more feasible to consider as a first approximation, only the 
three special cases previously discussed, in which c = k, c = 0, and 


_ , *In order to make this study it was necessary to use the original data sheets 
giving the number of errors for each animal in each trial. The author wishes to 
thank Dr. Krechevsky for making the original data available for this purpose. 











182 PSYCHOMETRIKA 


k = 0. These three cases correspond to c/k values of one, zero, and 
infinity respectively. The quickest method of deciding which of these 
three c/k values gives the best fit is first to try the case c = k, fitting 
the learning curves by the graphic method described by Thurstone 
(10). If the plot of uw against u/w is a straight line, then the c/k 
parameter equals one. If this plot is curved, with the convex side to- 
ward the origin, then ¢ is less than k and the extreme case in this 
direction, namely c = 0, can be tried. If the plot of u against u/w is 
concave toward the origin, then k is less than c and the extreme value 
in the other direction, k = 0, may give a satisfactory fit. If the fit is 
unsatisfactory for all three of these special cases, it would be neces- 
sary to use other trial values of c/k. When the foregoing procedure 
was applied to Krechevsky’s data, it was found that equation (2) for 
the case where k = 0, or where reward has a negligible effect in com- 
parison with the effect of punishment, gave a reasonably satisfactory 
fit for most of the cases. Therefore the value of k was taken as zero 
for the present set of records. 

It might be noted here that from the rational setup, one would 
anticipate that in the situations where shock was used as a punish- 
ment, reward might have a relatively slight effect, and in comparison 
with punishment might be close enough to zero not to be noticeable, 
but in the groups trained without shock one would not expect this to 
be true. However, Krechevsky’s method of recording errors was to 
record a maximum of one error per trial even though the animal may 
have made a large number of attempts to go through the locked door 
in any given trial. Since this would happen particularly at the begin- 
ning of each learning record, it can be shown that the type of distor- 
tion such a method of tabulating errors would introduce into the data 
would be in the direction of having a very small k (effect of reward) 
relative to c (effect of punishment). Since the actual number of er- 
rors preceding each correct response was not recorded and could not 
be estimated accurately from the data available, the only possible pro- 
cedure in this case was to use that form of the equation which gave 
the best empirical fit. 

It should be pointed out that in the Yerkes discrimination box 
setup used for this work by Lashley and Krechevsky, it is very diffi- 
cult to determine how many “punishments” the animal receives. He 
may push the wrong door several times or only once before leaving 
an alley; he may push very hard or not; and he may spend a long or 
a short time in the blind. The Lashley jumping box avoids these diffi- 
culties, since after a single wrong choice the animal falls into the net 
and must start again from the jumping platform. This characteristic 
of the jumping method makes it very easy to count the number of 














HAROLD GULLIKSEN “183 


punishments the animal receives, so that the jumping method should 
be better than discrimination box methods for checking on a theory 
which involves the “number of punishments.” 

However, as Woodrow (12) has pointed out, such characteristics 
of a curve as its asymptote, the rate of change, or the initial level are 
about the same for any equation that gives a close fit to a given set 
of data. 

The question of goodness of fit is an important one in a study of 
this sort. Since it is both impractical and uninformative to present 
110 graphs showing the fit to each of the learning records, the follow- 
ing method of indicating goodness of fit was used. The theoretical 
and observed number of cumulative errors were calculated for every 
tenth cumulative trial. The differences were squared, summed and 
divided by an approximation to the number of degrees of freedom 
available in this method of fitting, and the square root of this quo- 
tient was recorded for each learning curve. Since the origin, as well 
as every tenth trial, was considered in fitting, there were T/10 + 1 
points available, where 7 is total number of trials. Equation (2) has 
two parameters, so two was deducted from the number of points giv- 
ing 7/10 — 1. Taking this value as an approximation to the number 
of degrees of freedom, we can divide the sum of the squares of the 
discrepancies for each curve by the number of degrees of freedom. 


TABLE 2 


Frequency Distribution of “Goodness of Fit” 
for Each of the Six Groups and the Total 





Root mean Dark Dark Light 
square Shock No shock No shock 

discrepancy Normal Operated Normal Operated Normal Operated Totals 
0.3 - 0.4 1 1 1 1 4 
0.5 - 0.9 11 10 13 7 7 9 57 
1.0-—1.4 4 7 6 6 1 4 28 
1.5-1.9 6 4 7 17 
2.0 — 2.4 1 1 2 1 5 
Totals 22 23 21 22 10 13 111 


The square root of this quotient is a measure of goodness of fit which 
is roughly comparable for records of varying length and represents 
the average discrepancy between the theoretical curve and the experi- 
mental data. I do not know of any good measure of experimental er- 
ror to be used here in order to test the significance of the difference 
between the goodness of fit and experimental error. However, this 
measure indicates the average number of errors which separate the 
experimental and theoretical points. Table 2 gives the frequency dis- 
tribution of this measure of goodness of fit for each of the six experi- 



































iy 
im 
+ sem gf ha #22 Group = , Rat 60 2 
ool. ° ,t* & 2120 Eo 
hfe t 4, gfh.598 % Me eh.5, qne.t20 ~ 
e+ 2.36 = L6T $ 
so}. or” > 
tk S P 3 
2) u ° i 
oo} ol ° S 
re) aad 2 
c o ® 
“ru “L.2 3 
© ° » 
2 ° 2 é 
+5 oh » 
S E r ha 
° =) 3 
3 of E wo} O ° 
mH oO ° g a 
: . : o 
= i. Cumulative Trials _Cumulative Trials ae 
= 70 7) to 3° rc) 720 ? —t + . a5 . sc . a . ts z 3 
&0 
ee eae fee Eaee Eg 
2 hie 34.5, afhe 101s hic +22.1, g[h-.q6e aS 
S F Ge=.30 vi) r 
w «| 2 o -g 
3) Ea 
. wo o°o 
a 
ee a a 
tel a 3 
# c 3 
[To ; er Ss 
% r.> 30} © 2 ”? 
- > an) 
5 afc e= < a 
ao}. 2 6 Ts =e eA 
5 E @ 3 
ib O WS © 
o 82 
: ; : “ s 
ie VY... Cumulative Trials era Cumulative Trials me 
eo 20 cy) eo 60 100 120 ao os : + 4 60 100 rt) = 











HAROLD GULLIKSEN 185 


mental groups. An impression of the accuracy indicated by this meas- 
ure is given by the sample graphs in Figure 2. These graphs show 
the poorest fit, one of the best, and two intermediate fits. 

It can be seen that in no case was this average discrepancy less 
than 0.3 errors, nor more than 2.4 errors. Only five of the 111 cases 
showed a discrepancy in the 2.0 to 2.4 range, and over half or 61 
cases showed an average discrepancy less than one error. 

Let us first consider the data on the initial preference parameter 
which is summarized in Table 3 and Figure 3. Figure 3 gives the fre- 


TABLE 3 
Average Initial Preference Parameter and Standard Deviation 
of the Distribution of Initia] Preference Parameter 
for Each Group of Animals 


Group Positive Punish- 

No. Animal Stimulus ment N M o 

I Normal Dark Shock 22 .87 .23 

II Operated Dark Shock 23 .95 21 
III Normal Dark No shock 21 ALL .23 
IV Operated Dark No shock 22 92 21 
Vv Normal Light No shock 10 63 AZ 
VI Operated Light No shock 13 46 .23 


quency distribution of the g/h parameter for each of the six groups 
of animals studied. Table 3 gives the mean and standard deviation 
of each of these distributions. 

It would have been possible to run a straightforward analysis of 
variance on these data, if there had been a group of operated and of 
normal] animals run with shock as punishment and light as the posi- 


TABLE 4 


Analysis of Variance for the Initial Preference Parameter (g/h) 
A. Analysis of variance for groups trained dark positive 


Degreesof Mean 


freedom Square F p 
Operation 1 .01384 6.172 .05— 
Shock 1 .0050 2.303 05+ 
Interaction 1 .0012 
Error 84 .0022 


B. Analysis of variance for groups trained without punishment 
Degreesof Mean 


freedom Square F p 
Operation 1 .000079 
Light vs. dark 1 .087498 29.721 .01— 
Interaction 1 .025440 8.641 .01— 


Error 62 002944 











186 PSYCHOMETRIKA 


tive stimulus. Since these two groups were not included in the ex- 
periment, it will probably be better to use two separate analyses of 
variance, one for the comparison of the four groups all trained dark 
positive, and the other for the four groups trained without shock for 
punishment. These analyses were made using the method for dispro- 
portionate subclass numbers described by Snedecor (8) and are shown 
in Table 4. Table 5 presents the more important group differences 
analyzed by the t-test. 

The first half of Table 4 gives the analysis of variance for the 
four groups trained dark positive. The error variance or variance due 
to individual variability between the animals within the groups is 
.0022. On inspecting the variances due to the difference between the 
operated and normal groups, we see that this variance of .0134 is over 
six times the error variance which gives an F that is significant at 
the five per cent level. This significant difference between operated 
and normal] animals, taken in conjunction with the actual values of 
the means given in Table 3, leads to the conclusion that the effect of 
the operation on the animals trained dark positive is to significanily 
increase the g/h parameter. Since this means an increased preference 
for the incorrect response on the part of the operated animals, and 
since the incorrect response is “going to the light,” we can say that 
the effect of the operation was to increase the rats’ preference for the 


TABLE 5 
t-test for Data on the Initial Preference Parameter (g/h) 
Degrees of 
Groups Compared M,—M,_ 24, t Freedom p 
II-I .08 .07 1.24 43 2+ 
IV - Ill 15 07 2.27 41 .05— 
V-VI Ad -08 2.10 21 .05— 
III-V 14 .08 1.76 29 .10— 
Iv -VI 46 .08 6.04 33 .01— 


light as opposed to the dark alley in the Yerkes discrimination box. 

It may be noted that the variance due to the difference between 
the shock and no-shock groups and that due to the interaction between 
shock and operation are not significantly greater than error variance. 
This means that the change from the shock to the no-shock conditions 
had no effect on initial preference and that the effect of the operation 
was essentially the same for the shock and the no-shock groups. 

The analysis of variance for the four groups trained without 
shock as punishment, shown in the last half of Table 4, shows an er- 
ror variance of .0029, which is essentially the same as the error vari- 
ance given in the first half of the table. However, the variance due 











HAROLD GULLIKSEN 187 


to the difference between the operated and: normal animals is only 
.000079, or about one-thirtieth of the error variance. From these 
data, then, one might conclude that the operation had no effect on the 
g/h parameter, which is the opposite of the conclusion reached by in- 
specting the first half of Table 4. However, if we look at the rest of 
Table 4, it will be clear that there is no contradiction between the two 
halves. Since the change produced by the operation is an increased 
preference for the light, one would expect the g/h parameter to in- 
crease with the operation in animals trained dark positive, but to de- 
crease with the operation in animals trained light positive. An in- 
spection of Table 3 shows that this has happened for the four groups 
trained without shock. For the animals trained dark positive, the av- 
erage g/h parameter is .15 higher for operated than for normal ani- 
mals. For the animals trained light positive the average g/h para- 
meter is .17 lower for operated than for normal animals. These oppo- 
site effects cancel each other so that the analysis of variance shows 
that no significant difference was produced by the operation. How- 
ever, the variance due to interaction between the direction of training 
and operation is over eight times the error variance, which shows that 
the effect of the operation on animals trained dark positive is sig- 
nificantly different from the effect of the operation on animals trained 
light positive. 

This interpretation of the significant interaction variance is also 
borne out by the t-test results shown in Table 5. The g/h parameter 
for the operated, dark, no-shock group is significantly greater than 
that for the corresponding normal group, and in the case of the groups 
trained light positive the g/h parameter is significantly smaller for 
the operated than for the normal animals. All the evidence points 
clearly, then, to the conclusion that the effect of the operation is to 
increase the animal’s preference for the light. 

The last half of Table 4 also shows that for the animals trained 
without shock the variance due to the difference between the groups 
trained light positive and the groups trained dark positive is signif- 
icant, being practically thirty times the error variance. That is, the 
preference for the light is stronger than the preference for the dark. 
In the last two lines of Table 5, however, we find that this difference 
due to the change from light positive to dark positive is significant 
for the two operated groups, but not for the two normal groups. On 
the basis of this evidence, then, it seems safe to conclude that the 
operated animals prefer the light to the dark, but for normal animals 
it is not clear whether the preference is in the same direction, or is 
about equal for light and dark. 

The foregoing analysis of the material on the initial preference 











188 PSYCHOMETRIKA 


parameter may be summarized by pointing out that the effect of the 
operation was to increase the preference for the light; that the pref- 
erence for the light was clearly greater than that for the dark in the 
case of operated animals; that normal animals under these conditions 
clearly do not have a preference for the dark, but whether it is for the 
light, or about a fifty-fifty preference is not clear; and finally that the 
change from shock to no-shock did not affect the initial preference 
parameter. 

In addition to the foregoing points where the analysis corroborat- 
ed one’s expectations, attention should be called to another point 
where this is not so. If the change from light positive to dark positive 
does not influence the initial preference, then it would be expected 
that in general the initial preference under one training condition 
would be the reciprocal of the corresponding parameter obtained un- 
der the other training condition. If for example the preference for 
the light is twice as strong as that for the dark, then under the dark 


Normal 
Dark Shock 








Operated 
ark Shock 





Norma 
Dark No Shock 


Operated 
ark No Shock 


Norm 
Ushi No Shoek 


Operated 
Ligne No Shock 











i 





JO 30 50 70 YO UO 130 
29 4A 69 89 199 129 IKq 


FIGURE 3 
Distributions for g/h (initial preference). 


positive conditions one would expect g/h to equal two, while under 
the light positive condition g/h should equal one-half. The experi- 
mental g/h values found were uniformly too low for such an expecta- 
tion. The product of the average value found under the dark positive 
condition and the average value found under the light positive con- 
dition is 0.4 or 0.5 instead of 1.0. While no statistical test of the 
significance of this difference is available, it is probable that the g/h 








HAROLD GULLIKSEN 189 


values are uniformly too low. It may be noted that the comment re- 
garding the practice of counting only one error per trial made earlier 
in this paper is relevant again here, since the effect of counting every 
additional error would in general be to increase the number of errors 
made at the beginning of learning, which would increase the initial 
slope of the curve and thus increase the g/h parameter. 

Next let us consider the data on the learning ability parameter. 
Figure 4 gives the frequency distributions of this parameter for each 
of the six groups of animals. Table 6 gives the means and standard 
deviations of each of these distributions, together with the values of 
the logarithm to the base e , of these standard deviations, in order to 
facilitate comparisons of them by the z-test. The information on the 
z-test was not included in the analysis of the g/h parameter, because 
Bartlett’s test for the homogeneity of variances (8) showed homo- 
geneity both for the six groups as a whole, and for each of the two 
subsets of four groups considered. This was not true in the case of 
the learning ability parameter, so that a further analysis of the differ- 
ences in variance was necessary. It was found that the standard devi- 
ation of the learning ability parameter for the groups trained with 
shock was significantly smaller than that of the corresponding groups 
trained without shock. These significant z-values are shown in Table 


TABLE 6 
Average Learning Ability Parameter and Standard Deviation of the 
Distribution of the Learning Ability Parameter for 
Each Group of Animals 


N M o log, ¢ 
Normal Dark Shock 22 27.6 13.8 2.624 
Operated Dark Shock 23 30.5 12:7 2.541 
Normal Dark Noshock 21 42.3 21.5 3.068 
Operated Dark Noshock 22 52.1 20.4 3.015 
Normal Light No shock 10 44.8 16.3 2.791 


Operated Light Noshock 13 57.6 22.8 3.127 


7. The other comparisons between groups which differed in only one 
respect showed that the variances could be regarded as the same; 
hence these comparisons are not given. 

Inspection of the mean values in Table 6 shows that the average 
learning ability parameter is in every instance larger for the operated 
group than for the corresponding normal group. Since this parameter 
indicates the number of errors made between two “equivalent” points 
of mastery, we see that the operation decreased the learning ability 
of the animals. The learning ability parameter is also larger for the 
animals trained without shock than for those trained with shock, and 











190 PSYCHOMETRIKA 


there is a slight difference due to direction of training, the learning 
ability parameter being larger for the groups trained light positive 
than for the corresponding group trained dark positive. Let us turn 
to the results obtained from the analyses of variance in Table 8 to 


TABLE 7 


Data on the z-Test of Difference of Variance Between Groups Trained 
with Shock (Groups I and II) and the Corresponding Groups 
trained without Shock (Groups III and IV) 

Zz n, Ny p 
III-1 .444 20 21 .05— 
IV-II .474 21 22 .05— 


TABLE 8 


Analysis of Variance for the Learning Ability Parameter (h/c) 
A. Four groups trained dark positive 


Degrees of Mean 


freedom Square F p 
Operation 1 39.94 2.879 05+ 
Shock 1 828.33 23.662 .01— 
Interaction 1 11.97 
Error 84 13.88 


B. Four groups trained with no shock 
Degrees of Mean 


freedom Square F p 
Operation 1 127.24 4.380 .05— 
Light vs. dark 1 16.24 
Interaction 1 2.25 
Error 62 29.05 


see which, if any, of these trends are statistically significant. 

The analysis on the four groups trained with no shock given in 
the last half of Table 8 show an error variance of 29.05. The only 
other variance which is significantly greater than this is the variance 
due to the difference between the operated and normal groups. This 
means that we may conclude that the operation did increase the learn- 
ing ability parameter. The change from the light positive to the dark 
positive conditions, however, had no effect upon the learning ability 
parameter. 

With respect to the influence of the change from no shock to 
shock upon learning ability, we find first that the standard deviation 
of the learning ability parameter for the animals trained with shock 
is significantly less than the standard deviation of this parameter for 
those trained without shock. This result was obtained by Bartlett’s 
test for homogeneity of variance (8), the results for which have not 
been shown, and by the individual comparisons made by the z-test, 











HAROLD GULLIKSEN 191 


which are shown in Table 7. With such a difference in variance, there 
is no rigorously valid test for a difference in means. However, the 
conventional analysis of variance, which assumes homogeneity of 
variance, was used, and the results are shown in the first half of Table 
8. We find that the variance due to the change from shock to no 
shock is over twenty times the error variance (23.66). Since the ab- 
solute magnitude of this difference is also rather great, being 14.7 
points for the normal groups, and 21.6 for the operated, it probably 
is fairly accurate to say that the effect of the shock was to decrease 
both the mean and the standard deviation of the learning ability para- 
meter. An examination of the frequency distributions of this para- 





























"Sark siwok 
ar oc | 
a ae 
Operated 
ark Shock 
eae ran 
Normal 
Dark No Snock 
| 4 Lr 





Operated 
ark No Shock 
Normal 
signe No Shock ~~ ae fo! 
erated 
Light No Shock oe scaceaniee ae 


10 20 30 40 50 60 ‘10 
9 29 39 44 59 OF 19 84 aq 


FIGURE 4 
Distributions for h/c (learning ability). 











meter, given in Figure 4, shows that the lower bound of these distri- 
butions is about the same for all the groups, but the large values 
which occur frequently in the no shock groups are conspicuously ab- 
sent in the shock groups. It seems that the slow learners were helped 
by the introduction of the shock, while the fast ones were not. Kre- 
chevsky (4) points out that the “use of electrical shock for punish- 











192 PSYCHOMETRIKA 


ment tends to decrease such differences in capacity to learn between 
the operated and normal animals.” According to the present analysis 
there is not only a decrease in the difference between the operated and 
normal animals but also a change in the learning ability parameter 
for each of these two groups separately. It is as if under the no-shock 
conditions some of the animals were adequately motivated to solve the 
problem ; the addition of the shock did not improve their learning abil- 
ity. The shock, however, did increase the motivation, “effort,” “at- 
tention,” or “vigilance” of those who were poorly motivated under the 
no-shock conditions. 

The effect of the shock was to make the group superior and more 
homogeneous by eliminating the high error scores, although the shock 
did not enable the best animals to improve their performance. 

It might also be noted that in the analysis of variance for the 
four groups trained dark positive we find that while the effect of 
shock is significant, the effect of the operation is not. I would not in- 
terpret this as casting doubt upon the signifjcant effect of the opera- 
tion found in the analysis of the four no-shock groups. This non-sig- 
nificance of the operation probably means that the effect of the shock 
was to reduce the difference between the normal and operated animals. 
This reduction is so great that not only is the difference between the 
operated and normal animals trained with shock insignificant, but 
even when combined with the results of the animals trained dark 
positive without shock the totals are not significantly different. 

It might be pointed out that it is possible that if the original data 
showing that operated rats performed just slightly better than nor- 
mal rats on light discrimination problems had been dealt with by the 
method developed in this paper, it would have been discovered that 
the change was in initial preference and that the change in learning 
ability was not a non-significant increase for operated rats, but a sig- 
nificant decrease. 


Summary 


We may summarize the foregoing material by pointing out that 
we have the following results. Changing the problem from light posi- 
tive to dark positive does not alter the learning ability parameter, but 
does change the initial preference parameter. The operated animals 
prefer the light in the Yerkes discrimination box. For the normal 
animals the tendency to go to the light is either stronger than or equal 
to the tendency to go to the dark. It should be noted that in the con- 
ventional analysis in terms of error and trials, it is necessary to 
assume that the change from light positive to dark positive could not 











HAROLD GULLIKSEN 193 


affect learning ability, so that any significant change in error or trial 
score must be attributed to initial preference. With the present meth- 
od of analysis the evidence of a change in initial preference is inde- 
pendent of the evidence for no change in learning ability. 

The change from no shock to shock for punishment does not af- 
fect the initial preference parameter, but does alter the learning abil- 
ity parameter which is larger under no-shock conditions. Similarly, 
here we may note that in the conventional analysis in terms of trial 
and errors one assumes that the introduction of the shock could not 
change initial preference, and hence the change in error or trial 
scores is attributed to a temporary change in learning ability which 
is termed “attention” or “vigilance.” In the present method of analy- 
sis the evidence for a change in learning ability is independent of the 
evidence for no change in initial preference. 

The effect of the cerebral destruction is to change both paramet- 
ers, increasing the initial preference for the light and decreasing the 
learning ability, except that in the case of the groups trained with 
the shock the difference between the operated and normal animals in 
learning ability was too small to be detected with the number of cases 
used. This may be due to the fact that the effect of the shock on the 
learning parameter was so great as to obscure the effect of the slight 
cerebral destruction. Using the analysis by total errors and total trials 
it is impossible from these data to tell that the operation produces a 
change in both learning ability and initial preference. However, the 
analysis in terms of the parameters of a learning curve shows clearly 
that the operation produced both effects. It is also interesting to note 
a difference between the effect of the operation upon learning ability, 
and the effect of the shock upon learning ability. The operation in- 
creased the mean but did not change the standard deviation of the 
learning ability parameter, that is, all the members of the group grew 
slightly poorer because of the operation. The effect of introducing 
the shock was to decrease both the mean and the standard deviation. 
Differently stated, it was to improve the performance of the poorest, 
but not of the best, animals. This can be taken as indicating that the 
learning ability parameter reflects the degree of motivation, as well as 
amore constant “ability.” 


Conclusion 


A method of analyzing learning data in terms of certain learn- 
ing equation parameters which have interesting psychological inter- 
pretations has been developed and illustrated. In this illustration it 
has been shown that the analysis in terms of these parameters has a 








194 PSYCHOMETRIKA 


clearer psychological interpretation than does the conventional analy- 
sis in terms of total trials and total errors. 


BIBLIOGRAPHY 


1. Gulliksen, Harold. A rational equation of the learning curve based on Thorn- 
dike’s Law of Effect. J. Gen. Psychol, 1934, 2, 395-434. 

2. Herrick, C. Judson. Brains of rats and men. Chicago: Univ. Chicago Press, 
1926, XIV, + 382. 

8. Krechevsky, I. Brain mechanisms and “hypotheses.” J. comp. Psychol., 1985, 
19, 425-468. 

4. Krechevsky, I. Brain mechanisms and brightness discrimination learning. 
J. comp. Psychol., 19386, 21, 405-446. 

5. Lashley, K. S. Brain mechanisms and intelligence. Chicago: Univ. Chicago 
Press, 1929, XIV, + 186. 

6. Lashley, K. S. Studies of cerebral function in learning. XI. The behavior 
of the rat in latch box situations. Comparative Psychology Monographs, 
1985, 11. 

7. Ruch, F. L. The method of common points of mastery as a technique in 
human learning experimentation. Psychol. Review, 1986, 43, 229-234. 

8. Snedecor, G. W. Statistical methods. Ames Iowa: The Iowa State College 
Press, 1940, XIII, 422. 

9. Thorndike, E. L. Adult learning. New York: Macmillan, 1928. 

10. Thurstone, L. L. The error function in maze learning. J. gen. Psychol, 1983, 
9, 288-301. : 

11. Wiley, L. E., and Wiley, A. M. Studies in the learning function. Psycho- 
metrika, 1987, 2, 1-19, 107-120, 161-164. 

12. Woodrow, Herbert. Interrelations of measures of learning. J. Psychol, 1940, 
10, 49-73. 











PSYCHOMETRIKA—VOL. 7, NO. 3 
SEPTEMBER, 1942 


TESTS OF STATISTICAL HYPOTHESES IN THE CASE OF 
UNEQUAL OR DISPROPORTIONATE NUMBERS OF 
OBSERVATIONS IN THE SUBCLASSES 


FEI TSAO 
DEPARTMENT OF EDUCATIONAL RESEARCH, UNIVERSITY OF TORONTO 


General solutions of the analysis of variance in the case of un- 
equal numbers of observations in the subclasses are presented. If 
we have k criteria for the classification, there will be k! solutions in 
making a complete analysis and 2-1 answers, bearing different 
meanings, for the sum of squares between subclasses of each cri- 
terion. The sum of squares for the interaction of any order, how- 
ever, will be identical in different solutions of same problem. 


Introduction 


The technique of analysis of variance has been widely used with- 
in recent years to test statistical hypotheses in psychological and edu- 
cational research. The applicable equations are generally concerned 
with the case of equal or proportionate numbers of observations in 
the subclasses. In problems in education and psychology, however, 
the subclasses nearly always consist of unequal or disproportionate 
numbers of subjects. For instance, assume that we give a certain 
test to different grades of different schools and wish to know the dif- 
ferences in ability between grades or between schools. Is it advisable, 
or even possible, for us to make the numbers of all the subclasses ar- 
bitrarily equal or proportionate? For one thing, numerous data would 
have to be sacrificed merely because of the small size of a few sub- 
classes. As an alternative, the writer wishes to present some general 
equations which seem to be more suitable for use in the situation 
which generally exists in the field of education and psychology. The 
equations for equal subclass numbers will be given first, then those 
for unequal ones. For the latter, some special cases which lead into 
the general form will be presented. 


Equations for Equal Subclass Numbers 


Assume that we have given a test to k grades in each of m schools 
and that we wish to compare the ability of pupils in different grades 
or in different schools. First of all, it would be better to determine 
whether the subclasses are similar samples from a common popula- 


195 















196 PSYCHOMETRIKA 

tion by using Welch’s adaptation of the L, test (4). If we find no 
significant difference in the variability of different subclasses, then 
we can combine all the data of different groups. On this point, God- 
ard and Lindquist (1) have made an empirical study which showed 
that heterogeneity of different subclasses had slight influence on the 
tests of significance in the analysis of variance; so, even if we have 
subclasses which are not exactly homogeneous, we can still combine 
the results. We must keep in mind, however, that the tests of hy- 
potheses based on heterogeneous groups would not be so sensitive as 
those on homogeneous ones. 

Next we present some equations based on the assumption that 
the data may be combined. If the numbers in different subclasses are 
equal and we denote by X,;; the score made by the ¢t-th pupil in the 
s-th grade of the i-th school, the basic assumption in the analysis of 
variance is that we may write 


Xee— Att B+ Ci + Last Zr, (1) 


where s=1,2,---,k31=1,2,-:-,m;t=—1,2,---, ; k denotes 
the number of grades; m denotes the number of schools; and n de- 
notes the number of pupils in each subclass. A is a measure of the 
ability of all the pupils tested and is defined as the mean score for all 
individuals in different subclasses; B, is a measure of the ability of 
the s-th grade; C; is a measure of the ability of the i-th school; Js; 
represents the influence of the interaction between grades and schools; 
Zsit represents the error or residual. Since A is defined as the mean 
for all groups and individuals, it is necessary that 


= B.=90; 
>Ci=0; (2) 
8 

> 14=—0. 

4:76 


To obtain the solution, we first write 
V=S2 SX: — A — B, — C; — 1,:)*. (3) 


ett 


Minimizing 7? with regard to A, B,, C;, and J,; , we obtain 


1 se 
A= Vr zeXuHXe-, 


where N=—=kmn; 


1 1 = = 1 
gem FD ois —A —— > la=X,.. at Se mm 2, bas; 
Mn: ¢ ™m ; m 


i 











we tt wewe Wwe wet Oe & 


es 








FEI TSAO 197 


1 we 8 Se ve ‘7 
C;= = pq BEX -A-PEM=AX,. Xs ym lai 
: (4) 
Ls = 7 EX — A — Be — C= Xu. — X,.. — Xu. + X--- 
t 


1 1 
+—D>1,,+—>Dd I[,:. 
™ ~ 8t k 2 8t 
A dot in place of a subscript is used to indicate that the variable has been 
averaged with respect to that subscript. 


Substituting these values in equation (3) and simplifying, we have 
the absolute minimum, ,,’. 


s 4t st ¢t 


fe > X sit) 
w=TTT (Xeon — Ri }P=VETK, — > a (5) 


The hypothesis which we wish to test first is 
A, :1;=0, (6) 


i.e., the hypothesis that there is no influence of the interaction be- 
tween grades and schools. Assuming that H, is true, we have, from 
equation (3), 
X'r0 = s22 {Xeit ~~ = Gs}? (7) 
e ¢ 4 
Minimizing y?,. with regard to A, B,, and C; , we obtain 
A= z...; 


B,= X,..~ &..; (8) 
C; = X,,. — =. ° 


Substituting these values in equation (7) and simplifying, we obtain 
the relative minimum value 
=SDPSD (Kui —X .—- Xy.+X.. Si 


8s é¢ ft 


=" + = a a {Xoi. pig + — ¥. + X...}? 
es % 








(ZXvi)*) (STN)? 
i a a ed - 
(2B Xai)? (2 zz Xai)? 
-3{- on bie N 


= %a* + Yo. 











198 PSYCHOMETRIKA 


If we find that H, is true, then we may test the following two 
relative hypotheses on the basis of 77,0. 


H,:B,=0 (10) 


i.e., the hypothesis that there is no difference between different 
grades; and 


H,:C:=0, (11) 


i.e., the hypothesis that there is no difference between different 


schools. 
Assuming that H, is true, we minimize with regard to the re- 


maining quantities of y?,. to obtain the relative minimum value of 
X°r0 d tn . We find 


Xn = 40" 2 40" > > > (x, eo X...}? 


sit 





(SE Xait)? (SVB Xai)? - 
— 2 ay.2 it bd a. 
si a +2 mn N 








= 7." 4 Yo" + 4s" . 


Similarly, assuming that H, is true, we obtain the relative mini- 
mum value of 7’, , 77,2, by using the same method. 


a= + x27 + DUD (KX. — X..}? 


8s tt 


» Ff t 13 
kn N (13) 





cl (STD Xie)? 





=47?% + 77+ 


= Xa" a 0" i a, 


The additive property of the sum of squares is demonstrated in 
the identity 


SDD (Kit — X..}? = ye? + yo? + ys? +y0?. (14) 
oe £4 


All the results may be summarized as in the Table I: 

In the case of equal subclass numbers one finds no difficulty in 
making a complete analysis for the different kinds of variances. In 
the case of unequal or disproportionate numbers of observations, 
however, the calculation will become much more complicated, as dem- 
onstrated below. 











FEI TSAO 199 


TABLE 1 


Analysis of Variance of Test Scores of Different 
Grades in Different Schools 























Variance Degree of Freedom Sum of Squares 
Between Grades k—-1 Zzz{X,., — X...}? 
8tt 
Between Schools m—1 zz> [z.. a X..}? 
8atet 
Interaction km —-k—m+1 zzz ig. Ry Ay FS 
Error N—km zzz { ie X,i.}? 
Total N-1 ZEz{X, i, — X...}? 





EQUATIONS FOR UNEQUAL SUBCLASS NUMBERS 


A. Classification According to Two Criteria 

1. Two subclasses for one Criterion 

Assume we have given a test to boys and girls in different grades, 
and that we have found, on the basis of the L, test, that we may com- 
bine all the data. As the subclasses are of unequal size, we may de- 
note by 


Xsit =A+ B, a ie Ci aie I; = Rsit (15) 


the score made by the f-th pupil in the s-th grade for the i-th sex, 
where s=1,2,---,kK;i=1,2;t=1,2,---, mi; & denotes the num- 
ber of grades; i denotes the sex; n,; denotes the number of pupils in 
each of the different subclasses (the additional subscript si is intro- 
duced to denote the particular subclass considered). A is a measure 
of the common ability of all the pupils and all the subclasses, and is 
defined as the mean score for all the subjects tested; B, is a meas- 
ure of the ability of the s-th grade; C; is a measure of the ability of 
the i-th sex; I,; represents the influence of the interaction between 
grades and sexes; z,;; represents the error or residual. Define also. 


X Nai = Np. = Nyy + Nea; 


4 


Dd Ni = N45 
D Mn = N33 (16) 
Dd Ns2 = Ne; 


LD Mi =ZT Ms. = TNL =N. 











200 PSYCHOMETRIKA 


It follows from the definition of A that 
>n,.B,=0; 


bx = nili—O0. 


Proceeding as before, we write 
P=TITD {Kir — A — B, — C; — 1,5}?. (18) 


a a- ¢ 
Minimizing y* with regard to A, B,, C;, and I,; , we have 
don X... 


B,= EE (Cs + La) 
(19) 


I 
be 


C,=X,.-X..- =a) Mi (B, + 14i) 


1,5; = Xai. — 7... ea ao =X ie = 3 —_ Nsi (Ci + I,;) 


++ 3 ni (B, + Ii) 


i- 


where X... , X,.. , X.;. , and X,;. are as defined earlier. Substituting 
these values in equation (18) and simplifying, we obtain the value of 
the absolute minimum, ,,’. 


x? =SZTS {(KXsit — i.) = STD X ie — TDDB (Mi Xi) 
cs e 6 ¢€ eds 


(20) 
= > = je Xi ss z {Nex ,,: + Ngo X?,0.} . 


ee Ss 8 
The hypothesis we wish to test first is 
H,:1i=0, (21) 


i.e., the hypothesis that there is no influence of the interaction between 
grades and sex. Assuming that H, is true, we have 


Hro=TIDX {Xsit — A — B, — Ci}?. (22) 


se St 
If we denote by 
A; =A+t Ci ’ (23) 











FEI TSAO 
then we may rewrite equation (22) in the form 
Xr == 2 {Xu:— A — Be}? + ~ ~ (Zui Ay — By. 
Minimizing y?,. with regard to A, , A: , and B,, we obtain 
22 {Xe: — Ai — Bs} =0; 
p> {Xeor — Ar — Bs} =0; 
= { (Xe — Ai — Bs) + (Xsor — Az — Bs) } = 0. 
Solving equation (27) for B,, we have 
B,= = { (%e1 Xer. + Mea Xoo.) — (Me Ar + yz As) }. 
Substituting (28) in (25), we have 


" 1 - oi 
Dan. — Ay — {th Xu. + thea Xe) 


— (My Ai + Nee A.))} =0. 
From (16), %s. = %s: + Ns2 , SO We have 


> * [ (Mer + M2) (Xan. — A) — (My : + Neo } 


8 8 


+ (Mg; Ay + Neo A.)1| =0, 


which reduces to 


Ns, Ns — 
2 Pe (Xs. — X..)} 


z fre os) 


; Y %. 





A, —-A.= 





Similarly, substituting (28) in (26), we have 





= pat Zz... a ia} 





201 


(24) 


(25) 
(26) 


(27) 


(28) 


(29) 


(30) 


(31) 











202 PSYCHOMETRIKA 


The equations (31) and (32) are identical. But from (17) and (28), 
we obtain 


D ( (tar Xo. + Mee Xo2.) — (Me Ar + Mee Ac) } =0, (33) 


which reduces immediately to 
2.4, +0, 4, =a, Z... +0, 3, = NX... (34) 


Combining the equations (31) or (32) and (34) and solving for A, 
and A, , we obtain 




















Ngy Nsg oe 
~ ms (Ka. — X,.)| 
a ~ N.2 e 2 
A, —— x... + N Nei | ’ (35) 
8 Ng. 
Ng, Ng2  — ea 
= (Kn — Zit 
= nN. oi 
A: = N > Pe Ngo (36) 
8 Ns. 


Substituting all the obtained values in equation (22) and using the 
method introduced by Johnson and Neyman (3), we have 





Ns. 


8 


(MX. + Mee Xeo.)? 
.~= Po X?,i — 
PemELEL a3 | 





= [f* = x. : x.) 


8 Ns. 


ao 


s | Ms. 





. ” (37) 
=4e° + TD (Mi X*i.} — J (1. X*e..} 


2 as <a x..}f 


8 | MN. 


> he a 


at Ms. 








= 40° + 47. 











FEI TSAO 203 
If we find that H, is true, then in the second stage we wish to 
test the relative hypotheses which are based on yr. 
H,:C:=0|B,, (38) 
i.e., the hypothesis that there is no difference between the sexes with- 
in each grade; and 
HH, :B.=0|Ci, (39) 


i.e., the hypothesis that there is no difference between different grades 
within each sex. For H,,, we have 


Oy=TIXz {Xai —-A—-B,}, (40) 


ri’ 
oe +f 
For H_,,, we have 


2 
Xx ri” 


«M 


=z {Xie — A — C,}*. (41) 


Following the method used before, we obtain 


=2 >» 2 {Xeit rd X,..¥ =z = a a Zz {Ns. X?,..} 


& tt 


ge Be (42) 


bias ae |] 


4. Ns. 
— 2 2 apt oe 
» 4 |%sa a ha * xo + hy ” 


ag 








Lw=DVUD {Keir — Xi. P=DDD Xie — S (4 X25, 


2 4.°% 4A a 











= yo? + yo? + D{mr,. X4,..} — J (04 X44} (43) 
Ng Ngo — — 
=| - (Xn. — Xa] 
Ng1 Ngo iF Xa" " Zo" + Mie f 
Se 


Finally we wish to test the following two relative hypotheses 
which are based on 7? , and 72, respectively. 








204 PSYCHOMETRIKA 


H,,: B,=0, (44) 


i.e., the hypothesis that there is no difference between different grades 
as a whole; and 


B..2C.==6, (45) 


i.e., the hypothesis that there is no difference between the sexes as a 
whole. For both of H,, and H,, , we have 


2, =TDVDV (Xu — AP=X,,. (46) 





Using the same method, we obtain 


Ly,==> = X2,5¢ — N X*... = yo® +yo% + 2, + > {n,. X*,..} 





¥ (47) 
— N X*...=y,.* + x? + e +¢% 


ae => > = X? it —wN X?.. = a" = Xo" =, = = {Ni X?.;.} 
ee ¢ v 


m (48) 
—N X?*...= 7, + x° + cee, « 


These results may be summarized as in the Tables 2 and 3: 
TABLE 2 


Analysis of Variance of Test Scores of Different Grades for Two Sexes 
with Unequal Subclass Numbers (Solution I) 












































ie Variance ae Degree of Freedom Sum of Squares 
Error N — 2k Lon 
Interaction = k—1 x2, 
Between Sexes (Given By Grade) Ce ae -, 
Between Grades a k—1 - 

Total N—1 Dzz X2,,, — NX... 


Pb .8 








From the foregoing tables one can readily see that for two criteria 
with unequal subclass numbers there are two solutions in making a 
complete analysis. The sums of squares for error and for interaction 
are identical, but there are two possible values for the sum of squares 
between grades or between sexes: one is affected by the other cri- 
terion and the other is not. Finally, it will be noticed that the calcu- 
lation in the present case is connected with the combinations of differ- 
ent subclass numbers, and this connection does not exist in the case 











FEI TSAO 205 


TABLE 3 


Analysis of Variance of Test Scores of Different Grades for 
Two Sexes with Unequal Subclass Numbers (Solution II) 


























Variance Degree of Freedom Sum of Squares 
Error N — 2k xa 
Interaction ) ee x2, 
Between Grades (Given By Sex) k—1 a 
Between Sexes 1 oe 
Total N-1 TEE Xi, — NX? 





of equal subclass numbers. If we have three subclasses for one cri- 
terion, again with unequal subclass numbers, the calculation will be- 
come even more complicated, as demonstrated below. 


2. Three Subclasses for One Criterion 

Assume we have given a test to pupils in different grades in three 
schools, with unequal numbers of pupils in each subclass, and that we 
may combine all the data as in the previous cases. Most of the steps 
in the analysis of variance will be the same as in the last section. Let 
us emphasize only the calculation of variance due to interaction, i.e., 
the test of the hypothesis 


H,:i4=0, (49) 
from the basic assumption 
Xsit =A + Be+ Ci + [i + Zeit , (50) 


wheres=1,2,::-,k;i=1,2,3;t=—1,2,---, ni; k denotes the 
number of grades; i denotes the number of schools; n,; denotes the 
number of pupils in a particular subclass. As before, A is a measure 
of the common ability and is defined as the mean score of all the sub- 
jects tested; B, is a measure of the ability of the s-th grade; C; is a 
measure of the ability of the i-th school; J,; is the influence of inter- 
action; and Z,;; is the error or residual. Assuming that H, is true and 
following the method used before, we have 


Pro=TDV>Dd (Xi: —-A— B.— Ci}*, (51) 
 ¢. 4 


where 


D Ni = Nz. = Nyy + Neo + Ns; 
a 


D i = 25; 


8 











206 PSYCHOMETRIKA 


D> Na = 1.1; 
8 


Dd Nez = No; 
& 


> %s3 = N.3; 
& 


DD i = DN. =D n,=N; 
s 63 8 i 


and 
> n,.B,=0; 
y Ni C,=96. 
If we denote by 
A; =A + C; ’ 


then we have 


Co=>d zZ {(Xart ioe A, ci B,}? + (Xeoe a A, az B,)? 
8 ft 


+ (Xs3, — A; — B,)?}. 
Minimizing 7?,. with regard to A,, A,, A;, and B, , we obtain 
pp > {Xoit ~ i B,} =0; 
i 7 


=m {Xe — A, — B,} =0; 

| 

>» (Xi: — As — B,} =0; 
>> { (Xe a A, bias B,) + (Xsot a A, vo B,) 


+ (Xs: — As — B,)}=0. 


Solving equation (59) for B,, we have 


B, = : 
Ng. 





{(%., Ru. “3 Ns2 Ku. - Ng3 Xs.) 


— (1; Ay + Meo A, + N., Az)}. 


(52) 


(53) 


(54) 


(55) 


(56) 
(57) 


(58) 


(59) 


(60) 


Substituting (60) in (56) and following the method used in the last 


section, we have 











FEI TSAO 207 


Bie Uta {Ka — Xe.) — (Ar ~ A2)) 


_ A (61) 
+ Ng3 { (Xo. ia X53.) a (A, = A,))1}=0 ‘ 


Similarly, substituing (60) in (57) and (58) respectively, we have 


S fae Ural en — Xen) ~ (42 A) 
(62) 
+ Mes { (Xoo. — Xes.) — (Az — As) 3] =0; 
> ae uit, -~ 2.3 - a — 43 
(63) 


+ ite Glee ~ Bak ty ~ A001 e 0. 
The equations (61), (62) and (63) are related, i.e., (61) + (62) + 
(63) =0. But from (53) and (60), we have 
NA, + 2A, + M3A,=NX.... (64) 


Next we combine any two of the equations (61), (62), (63) and also 
the equation (64) and solve for A,, A,, and A;. Let us define 


en 











tt Mh. 

Ng2 Ns3 
ei 

Ng, Ng5 
>} Ms. [=e 

(65) 

z es (Xe. — Kes )} =a 

Neo Neg = a oily 
>} Ne. (Xoo - Xu.)|= b ’ 


Ns3 Ns x? _ ’ 
> a (X,3. — Xn.)| =e’, 
8 Ng. ; 


Then (61) and (62) can be translated to (66) and (67), respectively: 
(a+ c) A, —aA,—cA;,;=a'-C; (66) 











208 PSYCHOMETRIKA 


—aA,+ (a+b) A,—bA,;=b'-@, (67) 
Using the determinant method to solve (64), (66), and (67) for A,, 
A,, and A;, we have 


= —(N —n.,)b(c' —@) +n, ¢e(a — 0’) +n, a(b' — Cc’) 
saat N(ab+ac+ be) 3(68) 





A= n.b(c’ —a’) — (N — 1.) c(@ — Bb’) +2, a(b' — c’) 
epicomaied N(ab+ac+be) : 


gg table =) + ng 0(a’— b) — (Ns) ao’ —e) 

















AS 
N(ab+ac+ be) 
It follows that (70) 
c(a’ — db’) — b(c’ — @) 
“a ab+ac+be ; (71) 
a(b’—c’) —e(a —b’) 
_ a ab+ac+ be ; (72) 
a(c — «¢) — ef’ ~ vc) 
A,;— A\= (73) 


ab+ac+ be 


Substituting all the values in equation (51) and using the method in- 
troduced by Johnson and Neyman (3), we obtain 


eee, ee ae ee | 

a 8 Ns. | 

(6+ c)a?+ (a+ c)b'? + (a+ b)c'? 
abt-ac+be 








(74) 


2ab'c +2back+2ca 0’ 
ab+ac+be : 





which may be written to the following form by substituting in the 
values from (65): 





s ¢ t ae 


-[ fees I[ sete e0f 


w= BEE Ku EE MT) 














FEI TSAO 209 


Se ere Coe] 


| Ng. 3 es 


+> in a + al > fa (Xoo. im X..}] 


a 








‘S* 








49 | ee nly 5 Ng3 (Xe. X,..) | y Ke ae ‘f ek 














X 
Ns Ns. Ns 
Ngo Ns3 Ns1 Ngo == Ng, Ns3 = 
+ EPS Pa a — Xd |S Oe Xu.) | 
Ns, N33 N31 N32 LS == Ns2 Ngz = 
mi Aw — X, 
+z fate 5 Mute g,, — z,)ls Pam IIV/ 














[ “|| 3 we 
Ns. 
3. k Subclasses for One Criterion and m for the other 

Again, let us concentrate on the calculation of variance due to 


interaction, i.e., 
Ho= DZS {Xue — A — B, — Ci}*, (76) 
e¢ ¢€ 


where all the quantities are as defined earlier, except that i = 1, 2, 
--»,m. Define 

Dd Ni = Ng. = Ngy + oe + Nem}; 

D> Mei = 25; 

8 

Sm, = Ha; (77) 


8 


> Nsm = Nm}; 
8 


TVs HTM. HS VTN=N. 


It follows that 
> 2,. B,=0; 











210 PSYCHOMETRIKA 


Assume that 
A;j=A+C;. (79) 
We write 
ro = TZ {(Xsie — Ar — Bs)? +++» + (Xeme — Am — Bs)?}. (80) 
8 ft 
Minimizing 7?,o with regard to A,, A.,-::, Am, and B,, we have 
a> {Xi — Ai — Bs} =0; 
s ¢t 
at ae Te ae oe (81) 
pe {Xemt rot Am al B;} — 0; 
8 t 
D {(Xsie — Ar — B,) eee (Zune ~ Ag ~ B)) =O. (82) 
t 
Solving (82) for B,, we obtain 
1 uA 
B, =| a + eee + Nem : oe wa (Ns, A, 600 > Nem An)} : (83) 


Substituting (83) in (81), we have 


= ies [Meo { (Xer. — Xoo.) — (Ay — As) } + 


4 
si Nem bn: ai | _ (A, =e An) 1}=03 


(84) 


= (Ree [%s1 {: an es mad ae (A, = A,)} +... 


Nz. 
> Ns(m-1) { (Xo. =i Xsim-1)-) ve (A,, ro Aimy) )]]=0 ‘i 


The equations denoted by (84) are actually m in number, whose sum 
is zero. But from (78) and (83), we have 


nA, to + %mAm=NX---; (85) 


Combining m—1 equations out of the m equations denoted by (84) 
and also equation (85), and solving by using the determinant method, 
we can obtain all the values of A,, A.,---, and A,,. Consequently, 
we can also obtain the value of B, by substituting the values of A, 
A,,+*+, Am in (83). Finally, substituting all the obtained values in 
the equation (76), we can get the value of y’,.. 



































FEI TSAO 211 


B. Classification According to k Criteria 

If we have k criteria for the classification of test scores, there 
will be &! solutions in making a complete analysis, and 2*“ different 
answers for the sum of squares between subclasses of each criterion. 
For instance, assume we have given a test to different grades of dif- 
ferent schools with both sexes. The basic assumption will be as fol- 
lows: 


Xsijt—A + Be + Cy + Dy + Leas + esse , (86) 


where A denotes mean score of all the pupils tested; B, denotes a 
measure of the ability of the s-th grade; C; denotes a measure of the : 
ability of the 7-th school; D; denotes a measure of the ability of the 
j-th sex, I,;; represents all the interactions, i.e., 


Interaction between grades and schools 
Interaction between grades and sex (Interactions of first order) 
Interaction between schools and sex 


Interaction between grades, schools, and sex (Interaction of sec- 
ond order) 
and Zsi;: represents the error or residual. 
The hypotheses we wish to test are as shown in the following 
diagram: 


1.4=0 


6? 
I 
° 
oO 
Q 


.=0(C,,D, C.=0| B,,D, 


] 
/ 








C.=01D D,=0\C B,=0;D D,=016, B= C.=0\8 
| 

y v ‘ 

D,=0 c.=0 D,=0 B.=0 c,=0 8.=0 


where the arrows join the consecutive steps, in which the lower hy- 
pothesis uses the upper one as a basis. We can readily see that there 
are 6 (3!) different solutions for the complete analysis, and that there 
are 4 (2*") different answers for the sum of squares between sub- 
classes of each criterion. It is clear, also, that different answers bear 
different meanings. For instance, B, = 0 is the hypothesis that there is 





212 PSYCHOMETRIKA 


no significant difference between grades as a whole; while B, = 0|C;, 
D; is the hypothesis that there is no significant difference between 
grades with both the influences of schools and sex partialled out. One 
will also find little difficulty in calculating the pure interaction of sec- 
ond order by working out all the interactions of first order and sub- 
tracting their sum from the total interactions based on /,;; = 0. 

From all the above, with any number of criteria, whether or not 
the subclasses are of equal size, we can obtain all the values which we 
wish to find. 

In conclusion, the writer wishes to express his gratitude to Dr. 
R. W. B. Jackson, who has suggested the problem and given much 
help throughout the course of work. 


REFERENCES 


Godard, R. H., and Lindquist, E. F. An empirical study of the effect of he- 
terogeneous within-group variance upon certain F-tests of Significance in 
analysis of variance. Psychometrika, 1940, 5, 263-274. 

Jackson, R. W. B. Application of the analysis of variance and covariance 
method to educational problems. Bulletin No. 11, Department of Educational 
Research, University of Toronto, 1940. Pp. 103: 

Johnson, P. O., and Neyman, J. Tests of certain linear hypotheses and their 
application to some educational problems. Statistical Research Memoirs, De- 
partment of Applied Statistics, University College, London, W. C. 1., 1936, 
1, 57-93. (See especially equations (6) to (17) on pp. 58-60). 

Welch, B. L. Note of an extension of the L, test. Statistical Research Me- 
moirs, Department of Applied Statistics, University College, London, W. C. 
1., 1986, 1, 52-56. 








PSYCHOMETRIKA—VOL. 7, NO. 3 
SEPTEMBER, 1942 







A GENERAL FACTOR IN IMPROVEMENT WITH PRACTICE 


K. W. HEESE 


UNIVERSITY OF STELLENBOSCH 
STELLENBOSCH, SOUTH AFRICA 












Results of 10 trials on 6 tests for 50 subjects were analyzed, first, 
by applying the centroid method to actual improvement or practice 
scores and, second, by applying a formula developed by Woodrow for 
determining factor loadings for practice scores from the differences 
between factor loadings of initial and final scores. Contrary to ex- 
pectation, the two methods yielded discrepant results, for the expla- 
nation of which a hypothesis is advanced. The operation of a general 
factor was not demonstrated. Tentative interpretations of the factors 
extracted by the centroid method are offered. 
















The object of this study was to investigate the extent to which 
the capacity for training underlying general learning or practice abil- 
ity, is determined by a single common factor. The work of Kern,* in 
which the practice potentiality or the general capacity for training 
of the individual is repeatedly mentioned, was the primary induce- 
ment to this study, completed in 1937 in the form of an unpublished 
doctor’s thesis filed in the library of the University of Stellenbosch, 
South Africa.+ Since then the methods of factor-analysis, improved 
by the technique of Thurstone, made it possible to recalculate the re- 
sults. In addition, a paper on a similar investigation by Woodrowf, 
carried out with other tests and calculated by other methods, has been 
published. Consequently, it was considered worth-while publishing 
part of the results of this study. 
The tests used were the following: 

1. Addition: By means of a Kraepelin adding-test-book the subject 
had to add for 5 minutes sums consisting of three figures as fast as 
possible. The number of sums done correctly in the given time was 
taken as the score for that trial. 

2. Mirror-Drawing: The usual mirror-drawing apparatus was used, 
in which the subject had to draw a given figure. The score was the 
time taken to draw the figure. On the results of preliminary tests the 
figure was so constructed as to leave the maximum amount of room 
for improvement with practice. 


























* Kern, B. Wirkungsformen der Uebung. Miinster in Westf. 1930. 

+ Heese, K. W. (As translated from Afrikaans) A general factor in the 
practice-ability. University of Stellenbosch. 1937. 
- t Woodrow, H. Factors in improvement with practice. J. Psych. 1939, 7, 55- 







213 








214 PSYCHOMETRIKA 


3. Maze: The subject was blindfolded, and had to find the way 
through a simple maze with a stylus pen. The score was the time 
taken. 

4. Sorting: A sorting board with 20 two-digit numbers was used. 
For each number there were three tags, i.e., 60 altogether, which had 
to be sorted correctly in the shortest possible time. The score was 
the total time taken. 

5. Double Handle Test: The well-known apparatus of Moede was 
used, where the subject by turning two handles simultaneously or al- 
ternatively can draw a pencil line in any direction. The direction was 
a figure as in the case of the mirror-drawing test. The score was the 
total time taken. 

6. Tapping or Marking: On a piece of squared paper with 4-inch 
rulings the subject had to make as many marks as possible in sepa- 
rate blocks without touching the lines. The score was the total num- 
ber of blocks correctly marked in 30 seconds. 

These tests were given by the author as individual tests to 50 
students taking the second- and third-year courses in Psychology at 
the University of Stellenbosch. The tests were given on two days 
every week for 5 weeks, and thus ten trials were taken. 

The first task was to devise a method of measuring the practice- 
effect. The method followed was to calculate standard scores from 
the raw scores obtained in each trial of the tests. Thus directly com- 
parable units were obtained for the scores of all the subjects in all 
trials of the six tests. By means of a best-fitting straight line applied 
to the standard scores of an individual in one test, his initial, improve- 
ment or gain, and final scores were calculated, relative to the achieve- 
ment of the whole group.* 

The reliability of each of the tests was estimated. The usual 
method employed, viz., calculating the reliability correlation-coeffi- 
cient between two forms or subsequent applications of the test, is clear- 
ly impossible with practice tests, as it will simply amount to length- 
ening the test. Further, it would be difficult to get two forms of the 
test which would be strictly comparable, e.g., two forms of the maze 
test described above. With practice tests, therefore, a different pro- 
cedure had to be adopted. A suitable method appeared to be to divide 
the 10 trials of each test into two equal groups, namely the Ist, 3rd, 
5th, 7th, and the 9th; and the 2nd, 4th, 6th, 8th, and 10th, combining 
the scores of each series into single scores. In this manner each test 
is divided into two forms, and by the method described above, the 
initial, improvement, and final scores of each subject in the two forms 


* A full report on the scaling technique is presented in another article which 
has been submitted for publication in Psychometrika. 











K. W. HEESE 215 


of the test are determined. The correlation between these two forms 
of a test is used in estimating the reliability correlation coefficient for 
the test. These correlations are given in Table 1. 


TABLE 1 
Coefficients of Correlation for Initial, Improvement, and Final Scores for 
Two Forms of Six Practice Tests 


Test Initial Improvement Final Scores 
Addition 91 .63 .96 
Mirror Drawing .79 .74 -92 
Maze 53 .06 .69 
Sorting 82 .66 .90 
Double-Handle .88 85 -90 
Tapping 90 67 92 


The coefficients of correlation given in Table 1 were calculated 
from data obtained for the halves of each test. According to the 
Spearman-Brown formula for predicting reliability of lengthened 
tests* it is possible to estimate the reliability where the full length of 
a test is used. In Table 2 the results of this calculation are given. 


TABLE 2 


Reliability Coefficients of Correlation of Practice Scores for Lengthened Tests 
Calculated According to the Spearman-Brown Formula 


Test Initial Improvement Final Scores 
Addition .96 A br .98 
Mirror Drawing .88 85 96 
Maze 69 53 82 
Sorting .90 .80 95 
Double-Handle 94 91 95 
Tapping 95 80 .96 


From Table 2 it follows that: 

1. The reliability coefficients for the initial scores are high through- 
out. In the case of the Maze test alone it is below .88. Chance errors 
play such an important role in this test that a low reliability coeffi- 
cient was to be expected. 

2. The reliability coefficients for the final scores are higher through- 
out than those for the initial scores as was to be expected where prac- 
tice tends to decrease the influence of chance errors. A similar result 
is recorded by Woodrow: “The initial scores are less reliable than the 
final scores.... + 


*Holzinger, K. Statistical methods for students in education. New York: 


Ginn & Co., 1928, p. 169. } 
+ Woodrow, H. Scaling practice data. Psychometrika, 1937, 2, 245. 















216 PSYCHOMETRIKA 


8. The reliability coefficients for the improvement or gain scores 
are somewhat lower than those for the initial and final scores, al- 
though for 5 out of the 6 tests they are higher than .75, the standard 
usually set for such coefficients. Here again it is the Maze test which 
is an exception. 

Generally, then, the tests and test results can be accepted as re- 
liable. 

To investigate the problem of a common or general factor in the 
practice ability, the correlations among the gain scores in the six 
tests were calculated. The results are given in Table 3. 


TABLE 3 


Intercorrelations between the Improvement Scores of Six Practice Tests 
(The P. E. of each coefficient is given in parentheses) 





Tests 1 2 3 4 5 6 
1. Addition sw... 
2. M-Tracing ere ee 
(.088) 
3. Maze 444 <li Te 
4, Sorting 268 .223 | ere 
(.088) (.090) (.089) 
5. D-Handle 3823 .568 .056 ee 
(.086) (.065) (.095) (.095) 
6. Tapping 384 .109 .239 049 es 


(.095) (.090) (.095) (.094) 





From this table it is clear that: 
1. The correlations are all positive. There is therefore a direct rela- 
tionship between the practice scores in the different tests, however 
small. The question now arises to what extent the existing relation- 
ship is obscured by chance errors. By applying the Spearman formula 
for the correction of attenuation,* an attempt was made to eliminate 
the influence of such chance errors, (Table 4). From this table it is 
seen that the ratio between the coefficients of correlation and their 
respective P.E.’s are the same as before the corrections were made. 
Consequently the original correlations (Table 3) are used for further 
discussion. 

2. From Table 3 it is further clear that the correlations are low 
throughout. Of the 15 coefficients 4 are just greater than four times 
their respective P.E.’s, and two are on the border of three times the 
P.E. Nine out of the 15 therefore indicate that there is no certain 
relationship between the practice scores in these six tests. 


* Kelley, T. L. Statistical method. New York: Macmillan, 1923, p. 204. 




















K. W. HEESE 217 


TABLE 4 


Intercorrelations between the Improvement Scores of Six Practice Tests 
as Corrected for Attenuation (P.E.’s are given in brackets) 


Tests 1 2 3 4 5 6 
1, Addition ss wes 
2. M-Tracing cou” abate 
(.126) 
8. Maze .929 MAM * > eke 
(.156) (.175) 
4, Sorting 414 318 POU.) heat 
(.182) (.129) (.177) 
5. D-Handle 441 -717 .102 SOR catecke 
(.113) (.080) (.157) (.124) 
6. Tapping .590 155 .486 .074 SOR | > aes 
(.115) (.188) (.178) (.148) (.184) 


It therefore cannot be stated with certainty that there is a com- 
mon or general factor in the gain scores of these practice tests. Even 
if such a factor is present, its influence is so insignificant and is so 
easily hidden by chance factors that no resultant relationship can be 
expected between the practice scores. 

In the original investigation the tetrad-differences were calcu- 
lated according to the formula of Spearman with the following result: 


Generally the positive intercorrelations may be due to a general factor in the 
gain scores. Throughout, however, the influence of such a factor is so small and 
so easily obscured by chance or specific factors that a general or common factor 
in improvement with practice cannot be ascertained with certainty. 


Table 3 was subsequently subjected to factor analysis according 
to the centroid method of Thurstone with the following results: 


TABLE 5 
Centroid Matrix for Three Factors 
Factors 
Tests I II III h2 
1. Addition -763 047 .291 -787 
2. M-Tracing 641 —.459 —.185 .656 
8. Maze 521 .355 —.227 449 
4, Sorting 330 121 —.267 .195 
5. D-Handle .572 —.483 -214 541 
6. Tapping 352 .213 .206 20 


After rotation of the axis the final result was as follows: 








218 PSYCHOMETRIKA 


TABLE 6 
Rotated Factorial Matrix for Six Variables 
Factors 
Tests I II III 
1. Addition .858 184 129 
2. M-Tracing .132 401 .674 
8. Maze 432 .506 —.075 
4. Sorting 161 .410 .023 
5. D-Handle .228 .002 699 
6. Tapping .458 .031 .024 


Here three factors emerge which can be represented uncorrelated 
in a centroid system. This result confirms the previous conclusion, 
namely, that no general factor could be established. 

These results were obtained from the intercorrelations calculated 
on the basis of the empirical practice scores. Woodrow, however, 
gives a formula for calculating the factor loading for practice scores 
directly from the factor loadings of the initial and final scores only.* 
It is then not necessary to calculate the practice scores, or to analyze 
intercorrelations calculated from such scores. This formula was ap- 
plied to the present data with the following result. 


TABLE 7 


Intercorrelations between Initial and Final Scores of Six Practice Tests 
(Tests 1-6 are initial, 7-12 final scores of the same tests. The 
tests are numbered in the same order as in previous tables) 


6 7 8 9 10 11 12 


_ 
bo 
(ov) 
oe 


024 616 044 036 _...... 

228 415 009 249 194. ...:. 

.823 —081 —189 026 —044 047 ...... 

004 548 150 —198 382 3852 069 tess 

128 247 245 —060 088 —021 356 362 ...... 

10. 172 180 095 461 —060 057 234 024 264 ~2...... 

11. —.088 325 024 —100 489 285 —012 586 061 —194 ...... 
12. 041 453 —011 181 3826 652 138 424 209 019 816 


{9 90 VS TR go tO 


On analyzing the correlational matrix given in Table 7, four fac- 
tors were obtained as follows: 


* Woodrow, H. Factors in improvement with practice. J. Psychol., 1939, 
7, 56. 











K. W. HEESE 219 


TABLE 8 
Centroid Matrix for Four Factors 
Factors 
Tests I II III IV h2 
Initial Scores 
1. Addition .359 —.549 -566 .170 -768 
2. M-Tracing .696 355 —.312 .065 raf 6 
8. Maze .157 —.046 —.456 .149 .257 
4. Sorting 207 —.258 —.086 —.478 .858 
5. D-Handle 484 .399 .015 .202 4384 
6. Tapping .037 .287 .167 —.415 .070 
Final Scores 
7. Addition 401 —.580 .5386 245 .846 
8. M-Tracing .626 091 —.084 3838 -663 
9. Maze .425 —.284 —.213 .o20 .383 
10. Sorting 350 —.501 —.357 —.271 .573 
11. D-Handle 403 .507 .105 2a 479 
12. Tapping 621 .292 131 —.299 577 


After rotation of the axis the loading of the tests with these fac- 
tors were as follows: 


TABLE 9 
Rotated Factorial Matrix for Twelve Variables 


Factors 

Tests I II III IV 
Initial Scores 

1. Addition —.064 869 .065 .064 

2. M-Tracing -715 —.085 316 203 

8. Maze 131 —.202 443 —.051 

4. Sorting —.169 .077 199 584 

5. D-Handle .655 .031 .010 .067 

6. Tapping 882 .080 —.156 .628 
Final Scores 

7. Addition —.018 .906 .149 —.012 

8. M-Tracing -795 .051 .169 .025 

9. Maze .254 .216 .518 —.065 

10. Sorting —.193 .095 .600 .410 

11. D-Handle .678 .008 —.140 009 

12. Tapping 485 111 —.074 569 


From these data (Table 9) the factor loadings for the improve- 
ment or gain scores of the tests were calculated according to the for- 
mula given by Woodrow, with the following results. 














220 PSYCHOMETRIKA 


TABLE 10 
Factor-Loadings for “Practice”-Scores Calculated from the Difference between 
the Factor-Loadings for Initial and Final Scores, According to the 
Formula Given by Woodrow 


Factors 

Tests I II III IV 
1. Addition .086 .062 141 —.129 
2. Mirror-Tracing .084 143 —.155 —.271 
3. Maze .100 340 061 —.012 
4. Sorting —.023 .017 .386 —.119 
5. D-Handle .023 —.023 —.149 —.058 
6. Tapping 123 .037 .098 —.071 


A comparison between Table 6, giving the factor loadings for the 
gain scores calculated according to our method, and Table 10, giving 
the results obtained by Woodrow’s method, shows a clear difference. 
How is this difference to be explained? In the first case the factor 
loadings of the gain scores in the tests were calculated directly from 
the intercorrelations of the empirical gain scores, but these gain 
scores are actually the differences between the initial and final scores 
attained by the subjects. In the second case the factor loadings of the 
gain scores were calculated from the differences between the factor 
loadings of the initial and final scores. In both cases, therefore, it 
appears to be the factor loadings of the gain scores which were de- 
termined. 

According to our view the explanation of the discrepancy be- 
tween the results obtained by the two methods of analysis lies in the 
fact that the test-batteries used for the analysis were not the same in 
the two cases. 

In the first case the differences between initial and final scores 
in the tests were used for intercorrelation and analysis. The result 
would therefore show which common factors functioned in those dif- 
ferences as indicated by the test-results. Relative to the composition 
of the test-battery it can, therefore, be said that these factors are 
practice or improvement factors, irrespective of other factors which 
may be present in the initial and final scores. Further, it is possible 
that these same factors already appear in the initial scores, and cer- 
tainly in the final scores, but from the nature of the test-battery, (as 
constituted in this particular case) and the technique of factor-analy- 
sis, it is impossible to indicate the nature and functioning of such 
factors. 

With Woodrow’s method an altogether different battery is used. 
Although it seems reasonable to expect the same or corresponding 
results from this battery as from the former, the outcome will, simply 











K. W. HEESE 221 


as a result of the mathematical process involved, be different. Factors 
are functions of the test battery used. Where tests are added to, or 
taken away from a battery the factor pattern may be considerably 
changed.* 

It is further clear from Tables 6 and 10 that more factors func- 
tion in the initial and final scores than in the empirical gain scores. 
Thus one factor pattern will be obtained from the matrix of intercor- 
relations of the initial scores, another from the intercorrelations of 
final scores, and a third from the intercorrelations between initial 
and final scores. All these contentions are however subject to one re- 
striction. Where there is a perfect positive correlation between initial 
and final scores in each of the tests of the battery, the factor-pattern 
would remain the same for initial and final scores. The same pattern 
will then also appear where the intercorrelations between the initial 
and final scores of all the tests are analyzed. Usually, however, the 
correlations between initial and final scores in practice tests are far 
from, perfect. (Cf. Table 7). 

Summing up we may conclude as follows: 

1. In the case where the improvement scores were calculated and 
intercorrelated, the factors of improvement or practice were isolated 
in the subsequent analysis. 

2. In the case where the intercorrelations of initial and final scores 
of the tests were analyzed, and the differences in factor-loadings for 
initial and final scores are calculated according to Woodrow’s for- 
mula, something else was attained. Here the obtained factor-loadings 
indicate the amount of change in the factor-pattern as functioning in 
initial and final scores. Whether this amount of change may be taken 
as the factor-loadings for the practice scores remains uncertain. 

From this discussion it follows that in order to find the factors 
functioning in improvement with practice, the practice scores ob- 
tained by the subjects in practice tests must be intercorrelated and 
analyzed. 

In conjunction with this, a psychological interpretation of the 
mathematical results can be made. This interpretation is naturally 
highly tentative, owing to the small number of tests involved. In the 
original study however, an introspective analysis of the function of 
these tests has been incorporated. On the basis of this analysis, as well 
as on the basis of the nature of the test material, the following inter- 
pretation was made of the three factors obtained in the analysis of the 
intercorrelations between the practice scores obtained in the six prac- 
tice tests. 


* Burt, C. The factors of the mind. London, 1940, pp. 202-203. 











222 PSYCHOMETRIKA 


Factor I is a speed factor. Its highest loading is in the test for 
addition, which consisted of adding simple sums of three single digits 
each—clearly a matter of speed for university students. The improve- 
ment in the maze test also involves speed of movement, combined with 
the building up of a memory image of the maze, thus its loading in 
Factor II, which is interpreted as a memory factor. The tapping test 
is unique in speed only. The improvement during practice obtained 
in these tests is a function of a speed factor. Other investigations in 
which speed of movement is isolated as a secondary or primary fac- 
tor confirm the interpretation given above.* 

Factor II is interpreted as a memory factor (M,) on the basis of 
its significant loadings in the Mirror-Drawing test, the Maze test and 
the Sorting test. A memory image of the snags in the Mirror-Draw- 
ing figure, of the maze paths, and of the location of the numbers on 
the sorting board is of primary significance in improvement during 
practice in these tests. 

Factor III is interpreted as a perception factor (P) on the basis 
of its significant loadings in the Mirror-Drawing and Double-Handle 
tests. Introspective analysis of the function of these two tests 
showed without any doubt that the perception of eye-hand co-ordina- 
tions and relationship between the movement of the hands and the 
direction of the line to be drawn, is of primary importance here. The 
deduction or abstraction of these relationships, combined with an in- 
creasing speed of movement, especially in the case of the double- 
handle test, and a clear memory-image in the case of the mirror- 
drawing test, is the primary function in these tests. 

In an article by Ryan and Schehr an analysis of mirror-drawing 
is given as follows:+ 


It would seem more plausible therefore, to conceive of the change resulting 
from practice as involving a reorganization of the total system of relations be- 
tween visual pattern and direction of movement. This system must be regarded 
as more or less unitary, rather than as a patchwork of elemental connections. 


Summary: 

1. No general factor could be established in a series of 6 prac- 
tice tests. On the contrary, three factors were found. Tentatively 
these were interpreted as speed of movement, memory, and percep- 
tion. 


* Harrell, W. A factor analysis of mechanical ability tests. Psychometrika, 
1940, 5, 17-33. 

* McCloy, C. H. The measurement of speed in motor performance. Psycho- 
metrika, 1940, 5, 173-79. 

+ Ryan, T. A. and Schehr, F. General practice in mirror tracing. Amer. J. 
Psych., 1940, 53;, 593, . 


~ 











K. W. HEESE 223 


2. The factor analysis was carried out with the intercorrelations 
between the actual practice scores of the subjects in a series of 10 
trials. The factors may therefore be considered practice factors, irre- 
spective of additional factors which may function in the initial and 
final scores. : 

3. When Woodrow’s formula for calculating the factor-loadings 
for the practice or gain scores from the difference between the factor 
loadings of initial and final scores was applied, contrary to expecta- 
tion, a difference in the results was found which can be explained on 
the basis of the fact that different test-batteries were analyzed. In the 
first case we are dealing with factors functioning in the practice 
scores, independently of initial and final scores, and in the second 
case we are dealing with the amounts of change in the functioning of 
factors which determine the initial and final scores in the tests. 
Whether these amounts of change in the factor loadings for initial 
and final scores may be regarded as factor loadings for the practice 
scores is uncertain. 








